Today, Microsoft announced it has made a substantial breakthrough in image recognition with neural network systems achieving five times more levels than any previous system. With this new technology, Microsoft took first place in both the ImageNet and Microsoft Common Objects in Context competitions, beating numerous academic, corporate and research institutions.
In both of these competitions, teams compete to see how well their systems identify objects either in a photograph or video without the aid of humans. Microsoft’s entry was able to beat the competition by achieving a high level of accuracy that would meet and sometimes exceed human capabilities.
In a post detailing the success, Jian Sun, a principal research manager at Microsoft Research, explains they were able to achieve such incredible results because of their new incredibly deep neural nets.
To better explain today’s achievement, the post helps explain the context of its significance. A little background, neural nets are a computer process that mimics the biological processes of the human brain to make new connections between data points and learn continually. The adoption of neural nets has been a significant advancement in machine learning, and powers many of tools you may be familiar with and even use on a daily basis, such as Skype Translator’s real time translations or Project Oxford’s showcase projects How-old.net or TwinsorNot.net.
To put the scale of today’s achievement in context, neural net systems have been able to make huge advancements because computer scientists continue to find ways to add new layers. The challenge, however, is that adding more layers can result in signals disappearing in the system as” they pass through the system”, which stunts machine learning.
As Sun explains, computer scientists celebrated when they were able to have systems with eight layers just three years ago. Then last year, the field was excited by achieving “very deep neural networks” that comprised of 20 to 30 layers. By comparison, the system Microsoft used to win these competitions has 152 layers, five times more than any previous system. To make this complex neural network work, Microsoft researchers developed what they call a “residual learning” principle to “guide the network architecture designs.”
This new 152 layered system was what enabled the team to take first place in each of the three categories it entered for the ImageNet competition. And not just win but win by a large margin. Microsoft also won first place in its namesake MS Common Objects in Context challenge, which was originally funded by Microsoft but is now run by academics outside of the company.
The team was shocked by their own results, with Jian Sun saying “we even didn’t believe this single idea could be so significant” and Peter Lee, corporate vice president of Microsoft Research’s NExT Labs adding “it sort of destroys some of the assumptions I had been making about how the deep neural networks work.”
A video of Skype Translator's neural network powered real-time translation in action.
The new residual neural networks are already being deployed in other contexts besides just research challenges. It is aiding Microsoft’s machine learning efforts in Project Oxford, which recently gained API’s for developers to tap into. And The Redmond-based technology company is “working tightly with Microsoft’s product groups to include the best image understanding in existing or future Microsoft products and services.” Giving a new sense of immediacy to how these breakthrough technologies make their way from research labs into the devices we use every day.
Today’s announcement concluded with a quote by Lee saying after all of the considerable improvements that came with the advancement of residual neural networks, “we don’t believe we’re anywhere close to the limit of the significant improvement in data classification accuracy for any of these tasks.” It will be interesting to see what Microsoft can do with these new systems to make the tools so many of us use today that much more intelligent and responsive to us and our surrounding world.