Listen : Audio version of this article
The new Artificial Intelligence of Google can almost mimic the human speech perfectly. A company named DeepMind has shared the details about the Wave Net. DeepMind is a research company for artificial intelligence. The real human speech can be synthesized by using the deep neural network called Wave net. Now the technology has an improved version which is rolled out so that we can use it with the Google Assistant. Text-to-speech is otherwise known as the speech synthesis. It will utilize typically one of the two techniques. The recordings of a voice actor are chunked together into pieces in the concatenated TTS. The main drawback in this method is that whenever there are any changes or upgrades, the audio libraries should be replaced. A set of parameters are used by the parametric TTS to produce speech which is generated by the computer. But sometimes this speech can sound robotic and unnatural.
The waveforms can also be produced from the wave net. When a convolution neural network is developed by a system then waveforms can be produced from the scratch. In order to synthesize the voices a platform with many speech samples is used for training. We will also consider the waveform s which will sound realistic and which do not sound realistic. The natural intonation can be produced by the speech synthesizer and also includes the details of the lip smacks.
A unique accent can be developed with the samples which are fed into the system. If different data sets are fed then many distinct voices can be created by using them. The amount of computing power is significant in the wave net which a very big limitation is in fact. To generate audio which is of 0.02 seconds it needs one second which is not very fast.
From the past twelve months the improvements are done on the system by the engineers of DeepMind. They optimized the wave net to the point so that a raw waveform can be produced. These waveforms will last one second with just 50 milliseconds and they are 1000 times faster when compared with the original. There is an increase in resolution for each sample from 8 bits to 16 bits. The human listeners are contributed with high scores in the test. The consumer products like Google Assistant can be integrated by the system which is improved.
Japanese voices and US English can be generated in all platforms for the Google Assistant by using the Wave Net. Depending upon the samples which are fed into the system, special voices can be created. To synthesize the real sound of human speech for other dialects and languages, the wave net should be used by the Google.