Where is Microsoft headed in Windows 10 speech recognition?

PCWorld writer Mark Hachman pointed out a notable issue with Windows over the weekend. For all that Microsoft champions about being able to speak to Cortana, or translate your spoken words with Skype, Windows’ speech recognition remains underwhelming.

Microsoft has long believed in new forms of input to interact with PCs and devices. Windows, for the most part, remains a mouse- and keyboard-centric experience, but now you can also draw on web pages, dictate an email to Cortana, and unlock your device by looking at it. And as we recently reported, Microsoft believes a combination of your voice and hand gestures will eventually kill off the keyboard altogether.

Windows Ink

Windows Ink

So this raises the question that Mark Hachman gets at with his PCWorld article. Why isn’t speech recognition and dictation a bigger feature within Windows?

Hachman points out how this could be because ten years ago Microsoft did try to make voice dictation a significant feature in Windows Vista, and the effort didn’t go so well. Microsoft product manager Shanen Boettcher had a cringeworthy demonstration of a glitchy voice dictation system that got very little correct on the first try, or at all.

But times and the processing power of our PC’s have changed. In fact, Microsoft researchers recently produced the lowest-ever error rate in speech recognition thanks to machine learning. However, again as Hachman points out, Windows Speech Recognition is still powered by Microsoft Speech Recognizer 8.0, which has been around, and unchanged, since Windows Vista.

Hachman also tested Windows Speech Recognition and got an accuracy rate of 93.6%. He writes,

That’s pretty bad on paper, and somewhat behind the dedicated software I’m trying. Windows also had an odd habit of interjecting the word “comma” when I was dictating the punctuation mark. The speech community seems split on whether relatively minor mistakes like this are significant.

He notes though this was without any training of the software, and he says Microsoft employees reported that with proper training, Windows Speech Recognition can be up to 99% accurate.

Speech recognition is limited to dictating commands to Cortana, but not dictating entire documents.

Speech recognition is limited to dictating commands to Cortana, but not dictating entire documents.

Windows Speech Recognition seems overdue for a reboot. Just like Microsoft has introduced touch and ink capabilities throughout Windows and apps, it is probably time for your voice to be a useful input throughout the operating system. Hachman notes that one part of the problem in rolling out modern voice recognition to apps like Office is that the most recent voice recognition efforts are siloed away with the Cortana and Bing team.

Hachman writes:

I think it’s pretty obvious that the pieces of the puzzle are there, technically. If there’s any obstacle, it might be organizational: As of Thursday, Microsoft’s Office apps were spun out into their own group, away from Cortana and Bing.

But for all that Nadella has done to break down barriers between product teams, it seems like this challenge is right up his alley: to create a vision for Windows and Microsoft apps, like Office, tapping into the machine learning cloud powered efforts of Bing and Cortana. For their part, Microsoft did respond to Hachman’s article, saying:

We see value in conversations across a range of devices and experiences. We’re just at the beginning of what we believe is possible and certainly see lots of opportunity to connect Cortana and conversations into a number of productivity scenarios. Today, Cortana integrates with Office 365 for glance-able information about upcoming meetings, along with flight and package tracking, and Bing is also providing intelligent insights directly in Office. We will continue to invest heavily here.

So who knows, maybe this is something that Insiders might start seeing more inklings of in future Redstone 2 builds, or maybe even in the same ballpark of what is talked about at the upcoming October event. It would be fitting as September saw Apple announcing Siri on MacOS and Google bring Google Now into every room of your home. For all the attention other forms of input get with Windows, move universal voice recognition certainly seems long overdue.

Share This
Further reading: , , ,