With gradio and F5-TTS it is also possible to train your own voice by Speech-To-Text via Whisper and after this train your model to use the generated LJSpeech dataset to train your own voice model for F5-TTS. Video:
This way you basically can take any audio book with your favourite narrator, clone his voice and let him read ANY of your epubs. Even more, you could use F5-TTS extensions to use different voices e.g. for female and male characters:
{male} What's up?
{female} Nothing.
{male} Ok
{narrator} After this they both hung up the phone.
But you'd probably need a pretty decent GPU to get this going :-)
I did look for some options at the time, but wasn't satisfied by them, specially because of cost.
I even found a ~6gb container that would supposedly do this. But in the end, I already had the tools to produce the audio, so I just needed to orchestrate it.
Because I did. For the kids. Not exactly the same:
1. Put in any ebook, in any language.
2. LLM translates it into German.
3. XTTS turns it into audio.
Devil’s in the details, of course.
I’m usually very picky about translations. Many books that I’ve read in English and wanted to gift to someone in my family have turned out so rotten in their translation that their “soul” was lost. Against that background, I am very pleased with the results.
Cost: About $0.20 per book. A bit more if it’s Asimov’s New Guide to Science.
What does this do for longer texts? I have beem using Android TTS but produces unpleasant audio in terms of describing feelings (prosody). NotebookLM although very limiting produces very live audio.
Longer texts work the same, I'd say. Of course, they take longer to TTS. I've listened to 2 whole books already and it felt pretty good.
I've never used NotebookLM, so I can't say how they differ.
Sometimes it gets a bit messy around punctuation, but I've never had to go back and listen again.
If I interpreted the "prosody" part correctly, then there is some intonation, but it doesn't mimic emotions, so it's very basic. For example, it will slightly differentiate "Shut up" from "Shut up!" (with exclamation mark).
Another way of doing this is by using the 'Record TTS' facility of Librera Reader - a book reader for Android, available on F-Droid, no iOS version available as far as I know - which can use the TTS service on Android to narrate all formats supported by Librera to audio files.
Cool. Maybe you'd like to take a look at F5-TTS, where you can upload a voice sample to "clone" a narrator voice, e.g.
https://github.com/JarodMica/audiobook_maker
Here is a video about it:
https://www.youtube.com/watch?v=HbUnb5znNwM
With gradio and F5-TTS it is also possible to train your own voice by Speech-To-Text via Whisper and after this train your model to use the generated LJSpeech dataset to train your own voice model for F5-TTS. Video:
https://www.youtube.com/watch?v=GmketyZW2c4
This way you basically can take any audio book with your favourite narrator, clone his voice and let him read ANY of your epubs. Even more, you could use F5-TTS extensions to use different voices e.g. for female and male characters:
But you'd probably need a pretty decent GPU to get this going :-)Eleven labs does it for free (for now) and it works pretty well, multiple voices available and some are high quality
I did look for some options at the time, but wasn't satisfied by them, specially because of cost. I even found a ~6gb container that would supposedly do this. But in the end, I already had the tools to produce the audio, so I just needed to orchestrate it.
Did you build this for Christmas?
Because I did. For the kids. Not exactly the same:
1. Put in any ebook, in any language. 2. LLM translates it into German. 3. XTTS turns it into audio.
Devil’s in the details, of course.
I’m usually very picky about translations. Many books that I’ve read in English and wanted to gift to someone in my family have turned out so rotten in their translation that their “soul” was lost. Against that background, I am very pleased with the results.
Cost: About $0.20 per book. A bit more if it’s Asimov’s New Guide to Science.
That's cool, specially for that cost.
I think if the ePUB and the OS are in the same language, it should (?) work, although I haven't tested it.
What does this do for longer texts? I have beem using Android TTS but produces unpleasant audio in terms of describing feelings (prosody). NotebookLM although very limiting produces very live audio.
Longer texts work the same, I'd say. Of course, they take longer to TTS. I've listened to 2 whole books already and it felt pretty good. I've never used NotebookLM, so I can't say how they differ.
Sometimes it gets a bit messy around punctuation, but I've never had to go back and listen again.
If I interpreted the "prosody" part correctly, then there is some intonation, but it doesn't mimic emotions, so it's very basic. For example, it will slightly differentiate "Shut up" from "Shut up!" (with exclamation mark).
Another way of doing this is by using the 'Record TTS' facility of Librera Reader - a book reader for Android, available on F-Droid, no iOS version available as far as I know - which can use the TTS service on Android to narrate all formats supported by Librera to audio files.