IBM attempts to win back speech recognition crown: your move, Microsoft!

Laurent Giret

Last year, Microsoft made headlines when it announced that its speech recognition technology achieved the lowest error rate in the industry: with a 5.9 percent word error rate, the technology giant claimed that it had reached “human parity,” a big milestone for a company betting big on conversation being the next big platform.

Unfortunately for Microsoft, IBM recently shared that its Watson cognitive computing system has since reached new records in speech recognition with a 5.5 percent word error rate on the same SWITCHBOARD Industry corpus that Microsoft used (via ZDNet). In the announcement, the company hinted that Microsoft was misguided to think that its own speech recognition had already reached human parity:

Reaching human parity – meaning an error rate on par with that of two humans speaking – has long been the ultimate industry goal. Others in the industry are chasing this milestone alongside us, and some have recently claimed reaching 5.9 percent as equivalent to human parity…but we’re not popping the champagne yet. As part of our process in reaching today’s milestone, we determined human parity is actually lower than what anyone has yet achieved — at 5.1 percent.

IBM explained that it determined this 5.1 percent with the help of Appen, a global speech and search technology services company which provided guidance on how to reproduce human-level results. “this discovery of human parity at 5.1 percent proved to us we have a way to go before we can claim technology is on par with humans,” shared IBM.

Overall, it’s still not easy to find a standard measurement for human parity across the industry as the SWITCHBOARD corpus is not the only one corpus of linguistic data to use as a reference. IBM explained that it also tested its Watson technology with CallHome, another corpus composed of casual conversations between family members on topics that aren’t fixed in advance. “On this corpus, we achieved a 10.3 percent word error rate – another industry record – but again, with Appen’s help, measured human performance in the same situation to be 6.8 percent,” explained IBM.

It remains to be seen if Microsoft will fight back, but the Redmond giant still seems to have a more viable shot at making voice the new computing interface thanks to Cortana, than IBM’s somewhat smaller distribution efforts. The ubiquitous digital assistant has now been opened to developers, car manufacturers, and other device makers, and we expect it to be a major topic during the upcoming Build 2017 developer conference.