In a recent interview, Microsoft’s group engineer Fil Alleva opened up about Skype Translator's connection to the popular Star Wars characters, C-3P0 as well the long path it took them to grasp universal translation. While many may argue that Star Trek explored universal translation in a much more thought-out implementation, Alleva admits that it was C-3PO’s ability to understand and speak (presumably) millions of languages that he held in the back of his mind as the inspiration.
What we all had in the back of our minds, whether we say it or not, was C-3PO.
Like the upcoming seventh Star Wars film, the Skype Translator team is leaning heavily on 30 plus years of ground work. Microsoft’s specific involvements can be traced back to the company’s executive vice president of technology and research Xuedong Huang's early speech initiatives in 1993.
However, teamed with Huang and Microsoft’s technology and research executive vice president, Harry Shum, Alleva sought to look past nerdy sci-fi quarrels and get the technology into the hands of the average consumer. To spur a movement behind universal translation, the group began harnessing the growing improvements in computing power, machine learning, massive amounts of available data, and artificial intelligence.
The first thing you need is data. For a computer to learn to identify sounds, it needs lots and lots of examples to learn from. As more people use tools like Skype Translator or Cortana, those tools can get better and better because they have more examples to learn from. Huang calls this influx of usage the oxygen that is fueling speech recognition improvements.
The second ingredient is computing power. Not too long ago, that was limited to whatever a person had on their personal computer or mobile gadget. Now, thanks to cloud computing, there is exponentially more computing power for speech recognition than ever before, even if it’s invisible to you.
Finally, you need great machine learning algorithms. Speech experts have used many tools for machine learning over the years, with exotic-sounding names like Gaussian Mixture Models and Hidden Markov Models. A few years ago, Microsoft and other researchers hit on the idea of using a tool called deep neural networks to train computers to better understand speech.”
Even with much of the computing power hurdles behind the group, other hindrances continue to pose a problem for natural speech translation. Among the commonly known issues to present day speech recognition are echo-laden or noisy environments and poorly implemented mics.
However, Microsoft acknowledges the biggest obstacle to the unilateral adoption of speech recognition is the elusive comprehension factor. Understanding language is important in its own right, but almost as important if not more so is understanding the context in which words are used. The use of a single word can be expressed in any number of ways, resulting in a subtly machines have trouble comprehending. Microsoft’s managing director of Research Asia Hsiao-Wuen Hon weighed in on this topic, underscoring the unspoken desire users have when utilizing speech recognition:
That means you not only have to solve speech recognition, but you also need to solve natural language, text-to-speech, action planning and execution.”
Hon believes that once the three systems are working in unison with one another, speech recognition computing will have finally reached what he refers to as “AI-complete”.
While Hon appears to be cautiously optimistic, other scientists, researchers and engineers within Microsoft are optimistic on speech recognition leaping past the comprehension hurled in the nearn future.
The language barrier is going to be essentially nonexistent in four years, for the major languages and the major scenarios, –Arul Menezes Machine Translation team at Microsoft Research
With Cortana slowly branching out to other platforms, collecting data and various use cases as well as tools such as Microsoft’s TellMe gathering millions of text-to-speech combinations, it may only be a matter of time before speech recognition becomes as ubiquitous as droids like 3-CPO are in a galaxy far far away.