Microsoft has developed the first human parity speech recognition system with a word error rate of 5.9 percent. According to sources, researchers at the Redmond-based software giant have designed and developed a new technology that precisely recognizes the words in a conversation just like humans.
The new system was developed by a team from Microsoft Artificial Intelligence and Research. The tests conducted internally reveal the stunning fact that the speech recognition system makes fewer errors than professional transcriptionists.
Word error rate decreased in the new speech recognition system
When the test was conducted in September 2016, the word error rate was 6.3 percent. However, the latest round of usability revealed 5.9 percent WER, which is less than the previous figure.
Commenting on the development, Xuedong Huang, Microsoft chief speech scientist disclosed that the company has reached human parity because of a historic achievement. Huang said that by achieving this milestone, a computer will be able to recognize the words in a conversation as well as a person would.
Reduction in word error rate signals positivity
Meanwhile, experts are of the opinion that the 5.9 percent error rate is almost equal to that of people who were asked to transcribe the same conversation. It is also the lowest ever WER, which was recorded against the industry standard Switchboard speech recognition task.
Even five years ago, I wouldn’t have thought we could have achieved this. I just wouldn’t have thought it would be possible
Harry Shum, Executive vice president, Microsoft Artificial Intelligence and Research group.
In future, the latest milestone will have broad implications for consumer and business products that can be significantly augmented by speech recognition. The products which are most likely to make inroads by leveraging the new speech recognition system are Xbox, instant speech-to-text transcription, and Cortana.
This accomplishment is the culmination of over 20 years of effort
Geoffrey Zweig, Speech & Dialog research group
New speech recognition system touted to enhance Cortana
If the latest technology is implemented, Cortana will become more powerful and will become a true intelligent assistant. In order to reach the human parity milestone, the product team used Computational Network Toolkit (CNTK) developed by Microsoft.
The CNTK has an ability to quickly generate deep learning algorithms across multiple computers running a specialized chip called a graphics processing unit. This improves the speed to a great extent, which will lead to human parity.