Speech Recognition Now As Good As A Human

It has been reported that Microsoft’s speech recognition system has reached its lowest ever Word Error Rate (WER) of 5.1%, a rate that puts it on par with humans.

What Speech Recognition System?

For the last 25 years, reaching human parity with a speech recognition system has been a goal of Microsoft, and the company has, therefore, committed to investing in long-term research associated with it. The research, investment, and the resulting system (which includes an AI element) have fed into products and services like Cortana, Presentation Translator, and Microsoft Cognitive Services.

The 5.1% Error Rate

Last September, Microsoft’s speech engine is reported to have registered a 6.3% word WER , but Microsoft was able to bring it down to 5.9%. Further recent work on the engine lowered that rate 5.1%, which is the human word error rate.

Microsoft’s system is benchmarked against the Switchboard corpus, which is a dataset of recorded telephone conversations that speech research technologists have been using for more than 20 years to measure the capability of transcription systems.

Human parity of the kind that Microsoft has now achieved has obviously been a goal of the company’s research, and puts it well on the way to creating a system that can be an effective central component of many of its future products and services.

Why The Big Improvement?

Microsoft’s recent advances in AI techniques like neural-net based acoustic and language models, and innovations in enabling the system to take into account the context of the speech to make better guesses as to what unclear words are have led to the reduced error rate.

What’s Next?

Now that Microsoft has an advanced human speech recognition system, reports indicate that future work will focus on tackling the challenges posed by recognising accented speech, dialects, and conversations in noisy surroundings.

Getting a grip on accented speech and dialects could open the speech recognition system to more users globally, and recognising and capturing conversation in noisy environments could make the system more versatile and useful.

Another project in the works is improving the system’s ability to understand the meaning and intent of speech, which Microsoft sees as the next frontier for speech technology.

With all of these advances, and with more research in the pipeline, we can expect more improvements to be rolled out in the future updates of e.g. Cortana, Presentation Translator, and Microsoft Cognitive Services.

What Does This Mean For Your Business?

AI and the use of speech recognition are now becoming familiar as a way in which we interact with entertainment devices e.g. Amazon Echo and Siri controlled systems like Apple’s new HomePod, and how we interact with our PCs and mobile devices e.g. with Cortana. They are also playing an important role in how we interact with, and how security can be improved with company services e.g. via bots and verification / authentication systems used by banks.

AI and machine learning offer companies the chance to develop innovative products and services that offer the kind of customized, personalized experiences are highly valued by modern consumers. The ability of devices and services to adapt intelligently and relate more closely than ever to our personal likes and needs saves us time, and increase our loyalty to those products and services.

AI developments have been such that back in April, an AI program learned how to ‘bluff’ and beat expert human competitors to the prize money in a series of exhibition poker matches, and this month an AI program cracked a (physical) combination safe in 30 minutes by reducing a possible million combinations to just one correct code. Also, Google’s AI company DeepMind and Oxford University has developed WLAS, a system that can lip read better than a trained professional, and Google has reportedly used AI machine learning technology on its Gmail service with a reported 99.9% blocking of all phishing attempts that it detected.

Not all share the view that the rapid development of AI and machine learning of this kind is a positive thing as Tesla and SpaceX CEO Elon Musk recently described AI as a “fundamental risk to the existence of civilisation”, and a report in March this year by PwC claimed that over 30% of UK jobs could be lost to automation (aided by AI developments) by the year 2030. Recently, concerns have also been raised about how AI could be used to create custom malware to defeat antivirus software by learning how to tweak malicious binaries.

AI technology is finding its way into our daily lives to enhance and tie together existing products and services and new security technologies (biometrics) in new ways, and an essential element of communication, value addition, and convenience, must surely be an effective speech recognition system that is as close to our own as possible.