Really cool text to speech system. (inclusive docker setup)

2022-07-01 11:58:54

(63)

u/Whathepoo

Awesome ! A local bleeding edge TTS. You made my day.

2022-07-01 12:45:31

(36)

u/ryanknapper

Are there any examples of how it sounds?

2022-07-01 13:17:13

(45)

u/desirevolution75

https://mycroft.ai/mimic-3/

2022-07-01 13:47:03

(32)

u/DryHumpWetPants

Wow, so many voices. Love a lot of them. Spanish sounds amazing.

Would like to just suggest using more memorable names for the different voices, particularly for English US; having just the 3 letters can be a little hard tell the difference from the voices.

2022-07-01 17:14:29

(8)

u/HittingSmoke

Would be great to at least have them labeled with gender and accent. There are too many voices in the vctk dataset to come up with meaningful names for.

2022-07-01 14:47:30

(12)

u/Ucla_The_Mok

Would like to just suggest using more memorable names for the different voices, particularly for English US; having just the 3 letters can be a little hard tell the difference from the voices.

It's open source. If you actually purchase the Mark II and incorporate this into your setup, you're welcome to volunteer for that task. LOL

2022-07-05 04:29:14

(2)

u/juanjux

Agree - the Spanish voice sounds incredible.

(7)

[deleted]

(3)

I don't like It at all. Way more robotic than other languages

2022-07-01 18:14:39

(2)

u/DOLLAR_POST

For Dutch it still has a quite a way to go. Only 1 sounds like an average Dutch speaker (ABN), but still makes odd jumps and has weird emphasis. The others are either Belgium or have an heavy soft G.

Very cool project though. Will keep an eye on it.

2022-07-01 21:06:36

(1)

u/TheGlassCat

A lot of the US English voices sound a little Irish and others are distinctly "transatlantic".

2022-07-01 12:49:17

(16)

u/Snarka

Have a video on this page here, comparing it to the previous versions. Sounds a lot better.

https://mycroft.ai/blog/introducing-mimic-3/

2022-07-01 13:19:48

(1)

u/ryanknapper

Thanks, I didn’t know if examples were in there.

2022-07-01 17:12:29

(12)

u/tdehaeze

Does anyone know a good speech to text engine that can be self hosted? I would like to be able to use my voice to trigger actions on my honelab. Thanks

2022-07-01 19:03:28

(9)

u/GreenGear5

You can check out Rhasspy. It works well with predefined phrases.

2022-07-01 22:44:34

(2)

u/zanonymoch

What does one do with these functions? Like is it substitute for like "OK Google, call girlfriend"? Or what is this

2022-07-02 02:03:29

(3)

u/Starbeamrainbowlabs

See also deepspeech

2022-07-01 11:40:59

(13)

u/DELETED

Oh this looks great. Looks like there’s already a home Assistant integration for the display, now we just need one for TTS. I’ll spin it up and play with it in node red. Thanks for sharing!

2022-07-01 12:08:29

(13)

u/desirevolution75

MaryTTS Compatibility Use the Mimic 3 web server as a drop-in replacement for MaryTTS, for example with Home Assistent. https://www.home-assistant.io/integrations/marytts/

(15)

[deleted]

(4)

Well, you would have to migrate the python code to java or what do you mean with the "current state" issue ?

(2)

[deleted]

(7)

You can use Mimic3 as a drop-in replacement for MaryTTS which is supported by Home Assistant.

https://mycroft-ai.gitbook.io/docs/mycroft-technologies/mimic-tts/mimic-3#marytts-compatibility

2022-07-01 17:14:47

(2)

u/desirevolution75

How does your setup work ? Using Tasker on the Android ? And how is HA configured ?

(4)

[deleted]

(2)

Thanks for the explanation.

2022-07-01 13:49:45

(3)

u/DryHumpWetPants

Is it possible to choose two voices from different languages in the Multi Speaker Model? I am bilingual and would like to have it work in both languages.

2022-07-01 14:37:55

(1)

u/desirevolution75

Not sure if I got your question right .. you can switch between the voices. For example with ?voice= parameter if using the Web Server

2022-07-02 01:09:07

(2)

u/DryHumpWetPants

Sorry, I mean to ask if it is possible to have it work with 2 languages at the same time. Is there a way for it to read text that is in spanish in spanish and text that is in english in english. Or will it read all text with the one language that has been set up?

2022-07-02 04:18:40

(3)

u/desirevolution75

No, you can mix it but you would have to put in in SSML, check the second example here:

https://mycroft-ai.gitbook.io/docs/mycroft-technologies/mimic-tts/mimic-3#command-line-interface

(2)

Thank you

(1)

You can use SSML to mix different voices.

(3)

[deleted]

(1)

I think a Pi 4 should be fine and regarding the other question.. The audio is generated on the fly, so you could also dump "War and Peace" ^^ but it will take a while...

2022-07-01 17:25:17

(1)

u/HittingSmoke

https://mycroft-ai.gitbook.io/docs/mycroft-technologies/mimic-tts/mimic-3#long-texts

The docs claim it will process in real-time on hardware at least as good as a Pi 4.

2022-07-01 19:07:04

(3)

u/TechSquidTV

I really want the chance to run something locally trained on my own voice.

2022-07-01 20:52:34

(2)

u/gerwim

The voices are downloaded from https://github.com/MycroftAI/mimic3-voices/tree/master/voices/. So I suppose you can add your own voice.

2022-07-02 16:04:57

(3)

u/DELETED

I've got the integration working in home assistant! But I can't figure out how to define the speaker. When setting up the integration, you use the MaryTTS voice key to define both the mimic3 language and name in one field, like "en_US/vctk_low", but I can't figure out how to define the mimic3 speaker. Any ideas?

Wait: I got it! Add it to the end like "en_US/vctk_low#XXX"

2022-07-02 01:06:26

(4)

u/computerjunkie7410

A lot of people are asking for use cases for something like this so I’ll mention some ways I use Text To Speech:

announce through my home speakers when someone opens a door (similar to a security chime).
the evening before garbage day, if the garbage has not been taken out, a message will play reminding me to take out the trash.
when my toddler gets out of bed after we have tucked him in, a message will play in his room telling him to go back to sleep.

2022-07-02 03:27:18

(3)

u/Aman4allseasons

when my toddler gets out of bed after we have tucked him in, a message will play in his room telling him to go back to sleep.

My first thought was - "That sounds amazing, automatic parenting."

But kids are too smart for that: they'll figure out its just a recording VERY quickly.

2022-07-02 03:53:21

(1)

u/computerjunkie7410

Oh he knows, but he can’t figure out why it happens exactly when he gets out of bed. Still works and he will go back to bed unless he needs to use the bathroom or had a nightmare

2022-07-02 05:47:23

(-4)

u/theRealNilz02

Another selfhosted Project that isn't self hosted. Stop with the docker ad campaign in this sub.

2022-07-02 06:00:43

(2)

u/desirevolution75

Another not needed comment ... Stop whining and start reading .. I mentioned Docker in the title because many of us here prefer to use it but you can also directly install the softẃare on your machine ... Happy now ?

2022-07-02 06:03:52

(-4)

u/theRealNilz02

You should still Stop with the ad campaign because it's Just so annoying.

2022-07-02 06:13:32

(3)

u/desirevolution75

I could say the same about your comments.. Just whining and saying I should stop sounds very insecure/immature for me ... And it doesn't bring anything usefull to the discussion here ...

2022-07-01 17:16:03

(1)

u/sheveqq

As an amateur here, could anyone explain an example use case? I've been incredibly frustrated by the lack of good/accessible TTS on android and Linux, and if I can repurposed an RBP for this and throw it on the local network I'd be happy to. Is that on the right track then--a local 'device' needs to be set up to run the system?

2022-07-01 21:24:01

(1)

u/laundmo

Yes, currently this needs to run on some device you own. Using it like the built in android TTS seems impossible currently. But if you just want to generate audio from a text the webpage should work on any device with a browser.

on linux you could probably install it locally so that you don't need a seperate device.

2022-07-01 17:51:13

(1)

u/youmeiknow

Can someone help me understand, what are the use cases?

In simple is it self hosted version of Google translator(voice part) or Google home voice assistant?

2022-07-01 22:36:20

(2)

u/0x636f6d6d6965

it's part of Mycroft which is an Alexa/assistant/Cortana/Siri competitor

2022-07-01 23:07:01

(1)

u/youmeiknow

Sounds good, say I have started running on my server, how can I start using it? Like I can integrate Google assistant and send commend to the speaker of it? And when I ask a question, my server responds instead of Google's?

2022-07-01 23:09:38

(2)

u/0x636f6d6d6965

frankly, it's not easy to implement. my dad is blind and I tried about 3 years ago. I have to say it does look like they've done a lot of good work, but I don't know how to answer your question.

2022-07-03 02:31:58

(2)

u/DELETED

I use tts to make announcements on my google homes. Using Home Assistant, I can send text to mimic3 and then broadcast the resulting audio file on the speakers. For example, if my cameras pick up a person at the door then I broadcast a message. Same if the door of my fridge is left open.

2022-07-01 21:29:40

(1)

u/laundmo

this is really cool! sounds amazing!

2022-07-02 00:22:03

(1)

u/Jackmint

Could this work in a chatbot? I wish I had spent time learning how to configure docker stuff. I have an API for Dummies tool that I use.

2022-07-20 08:54:45

(1)

u/LouisLuHy

AI voice studio recommendation
Dupdub is a great text to speech platform with 130+ lifelike voices, 15+ editing features and many tools for content creators to solve their issue in making videos. Really worth a try. https://www.dupdub.com/

Really cool text to speech system. (inclusive docker setup)

Comments (51)