Text-to-speech has become a mainstream technology due to its power to make speech more accessible. It gives both humans and machines the ability to speak. By receiving text prompts, a text-to-speech engine is able to instantly synthesize speech that can sound as realistic as an actual human.
However, there’s so much more that humans can do with their voices that text-to-speech hasn’t been able to replicate in the past. One such example is singing. People can use their voices to produce musical tones while speaking. Modern text-to-speech engines do not have the ability to replicate musical tones.
Now, we at NeoSpeech are working on taking our natural sounding text-to-speech voices and giving them the ability to sing beautiful melodies. Much like how text-to-speech made the ability to speak more accessible, we want to make the ability to sing more accessible.
Want to hear it for yourself? Here’s an example of what our Japanese singing text-to-speech voice sounds like:
How Singing Text-to-Speech Works
Singing text-to-speech isn’t necessarily new. It’s actually been around for a while. In fact, last year we covered the story of Hatsune Miku, the text-to-speech pop star.
Vocaloid computer programs like the one that lets users create songs by the persona Hatsune Miku can synthesize singing. Each program is created by having a voice actor record phonemes in different pitches. Then, using the computer program, users can put these phonemes together to create lyrics and melodies.
The problem with these programs is that they’re usually limited to just one voice and that they don’t sound human-like.
NeoSpeech is looking to create more natural sounding singing text-to-speech voices through the use of HMM-based text-to-speech.
As discussed in our article, “Which Speech Synthesis Technique is Better?”, the HMM method is a statistical parametric synthesis technique that generates speech by taking recordings from a voice actor and modifying it to sound as similar as possible to the inputted text.
Basically, HMM-based text-to-speech engines don’t fully preserve the voice of the original voice actor when generating speech. It is able to modify the speech to sound the way the engine believes it is supposed to sound.
This flexibility makes HMM-based text-to-speech engines perfect for creating singing voices. It enables the engine to create high and low notes to transform normal words and sentences into songs.
With the engine’s ability to modify the voice in place, the last step is giving the text-to-speech engine knowledge of the music. Much like how a regular text-to-speech engine needs to know the text, a singing text-to-speech engine will also need to have knowledge of the musical structure.
There are several ways this can be achieved. One way is to input musical data into the singing text-to-speech engine with a musical score data sheet from a program like MuseScore. MuseScore is a simple tool that lets you build musical scores (like the one below) on your computer.
Once the HMM-based text-to-speech engine receives the musical data and the text (or lyrics), it can generate the audio file!
Like we said though, that’s just one way out of many that singing text-to-speech can be achieved. We at NeoSpeech have been experimenting the best ways to create a singing text-to-speech voice. While there’s still some time till we perfect the process, we can happily say that we’ve already had some great results!
NeoSpeech’s Singing Text-to-Speech Samples
Using the same voice in the above example, we gave our singing Japanese text-to-speech voice the musical data and lyrics to Jingle Bells (in Japanese, of course)! Here’s a sample of the synthesized singing:
Then, we threw in some background music to truly make it sound like a professional song recording:
Pretty cool, right? For now, we at NeoSpeech have only worked with our Japanese voices on singing text-to-speech. Don’t worry though, we’re not forgetting about our other languages!
Singing text-to-speech is still in development, but NeoSpeech is working hard to push the boundaries of what text-to-speech is capable of and we’re hoping to make the ability to sing accessible to all in the near future!
What do you think?
Did you enjoy our singing text-to-speech examples? What are you most excited about using singing text-to-speech for? Let us know in the comments!
Learn More about NeoSpeech’s Text-to-Speech
Want to learn more about all the ways Text-to-Speech can be used? Visit our Text-to-Speech Areas of Application page. And check out our Text-to-Speech Products page to find the right package for any device or application.
If you’re interested in integrating Text-to-Speech technology into your product, please fill out our short Sales Inquiry form and we’ll get you all the information and tools you need.
Related Articles
Check out NeoSpeech’s Interactive Online Text-to-Speech Demo!
Learn how to enable NeoSpeech’s Text-to-Speech voices in Microsoft Word
What is Text-to-Speech and How Does It Work?
The post NeoSpeech Is Working On Singing Text-to-Speech Voices appeared first on Text2Speech Blog.