Over the weekend, Microsoft achieved a new industry milestone with their speech recognition system. The company’s research team reached a 5.1 percent error rate in their speech recognition system, reaching human-level accuracy, and passing the 5.9 percent word error rate set last year.
According to Microsoft, the company’s researchers were able to reduce the error rate by about 12 percent by using a series of improvements to neural net-based acoustic and language models. Researchers also combined predictions for multiple acoustic models. Of course, the researchers benefited most from using the Microsoft Cognitive Toolkit 2,1, and Azure GPUs, which allowed them to explore model architectures, and optimize hyper-parameters, as well as improve the speed as which models could be tested.
A detailed report on the technicalities of the speech recognition system is available here. Although there is a lot of work left to do, and more challenges to address, Microsoft has long had the goal of reaching accuracy on par with humans. So, you can expect for this research to pay off in the long-term, and help improve Microsoft consumer services such as Cortana, Presentation Translator, and more.