It all started when I wanted my own wakeword for Mycroft. I followed the recommendations for creating a wakeword model with Mycroft Precise. It didn't work very well. It kept waking up falsely all the time and it didn't work for my wife at all, even though I collected her saying the wakeword a few times.
This awoke the inner data scientist in me. How can I create a wakeword model that works as good as production quality without having to do thousands and thousands of recordings to get that?
I realized that if I am having this problem, so are many other people. No wonder wakeword systems aren't so ubiquitous. It also made me wonder what other blockers there were in NLP (Natural Language Processing) stopping us from having awesome FOSS voice assistants for all the self hosters out there.
It turns out there are a lot of problems and a whole community of FOSS voice assistant developers facing those same problems. Our only path to success was to unite as a community.
Secret Sauce AI
🔍 Secret Sauce AI is a coordinated community of AI enthusiasts. We have come together as many individuals and projects in the FOSS voice assistant space to solve big AI problems for everyone out there.
We are focused on many areas of AI (especially in NLP), but our 🔎 first project is in the area of wakewords.
Cool, but what's a wakeword exactly?
When you use a voice assistant, you usually start by waking it (ie 'hey Mycroft' or 'hey Siri'). This wakeword is a binary acoustic model ('wakeword' or 'not-wakeword' classes) that triggers ASR (automated speech recognition) transcription when the wakeword is uttered. This is generally how all voice assistants work.
Yeah, that's nice and all but there are already wakewords out there...
True, but making your own customized production quality wakeword can be rather difficult (like impossible) using FOSS solutions, as I personally found out. And it shouldn't be this hard to make your own wakeword!
Let's breakdown the problems and the solutions.
1. Data collection
Problem
How much data do you need, what kinds of data, how do you go about collecting it? There really isn't so much exact information out there, and big companies usually collect thousands to millions of samples to make their production quality wakewords. That is a bit beyond the average self hoster's resources.
Solution
So the solution was to experimentally figure out a data collection recipe and make that data collection as sparse as possible while making sure it produced a production quality wakeword. That's a tall order, but we worked on this (way longer than we want to admit).
We have released a prototype 📦 Wakeword Data Collector in Python that runs a user through the collection process.
2. How do you make a production quality model with machine learning and all of that stuff?
Problem
It can be hard to hit on a winning recipe to train a production quality model, especially if you aren't doing this professionally as a data scientist. There isn't so much information out there on the exact recipe to do this.
Solution
We experimentally figured out the best recipe while keeping the data sparse and made the 📦 Precise Wakeword Model Maker to do it all for you automatically.
It uses Mycroft's Precise engine to train a model for you. It pulls out every trick in AI we know to get a higher quality model from sparse data and uses new ground breaking techniques; from using TTS engines to generate more data, to using incremental and curriculum learning methods to improve the training and testing scores, and much more.
Help!
We would love to have more help working on AI based projects for the self hosting community, as we are just some people who know each other mostly from Reddit and Github that do this in our free time. Please feel free to DM me if you want to help out.
Future projects
Wakeword
This release represents the first phase in the wakeword project, we are working on a 📦 Rust wakeword engine based on Precise and a 📦 SpeechPy MFCC port in Rust so that user's can run the wakeword easily on their phone and other devices. It is hard to believe that there aren't any current good solutions to running a modern FOSS wakeword engine on a phone in real time. We want to change that and allow everyone access to this technology, with their own wakeword of choice.
We would also love to improve upon our current prototype releases and welcome feedback and especially help from the community.
NLU-NLG: Natural Language Understand, Natural Language Generation
This will be the next project we focus on. We will benchmark current solutions, improving general data sets, and publish information to help everyone improve upon their current NLU-NLG use cases. All of this is still a heavy work in progress.
Lots more to come in NLU-NLG, so stay tuned.
Voice Assistant Bus Protocol
We are working on a universal 📦 Voice Assistant Protocol (VAP)
Generally, we are working on many more projects, but it's still too early to speak about them with any detail. If you are curious and want to know more, you can always write me. Once again, we would love any help offered. There are a lot of big AI problems to solve out there and we are just some random passionate folks, not some fancy company or anything.
Member projects
A lot of our Secret Sauce AI members build FOSS voice assistant software. It is always worth checking their software out. We just love this community!
- OpenVoiceOS
- Athena and The Sapphire Assistant Framework
- Lily
- Leon AI
- GLaDOS Voice Assistant
- V.I.S.O.R.
We would love to give a shout out to the folks over at Mycroft, if it weren't for them we wouldn't have a modern FOSS wakeword engine or a lot of other stuff in the FOSS voice assistant community.
A BIG what's up to the Secret Sauce AI member's out there on Reddit:
-
u/DADi590
-
u/nerdaxic
-
u/Louistiti
-
u/TemporaryUser10
-
u/eternal_spectator
-
u/BusinessBandicoot
-
u/okyaygokay
-
u/re-sheosi
-
u/equidamoid
And last but not least, a special greeting to Dan "The Man" Borufka who's been down with Secret Sauce AI since day one. Thanks Dan!
tl;dr Secret Sauce AI made some FOSS tools to make your own custom wakewords easily:
and we are working on other FOSS voice assistant related problems in AI to share with the community.