Voice recognition is a standard part of the smartphone package these days, and a corresponding part is the delay while you wait for Siri, Alexa or Google to return your query, either correctly interpreted or horribly mangled. Google’s latest speech recognition works entirely offline, eliminating that delay altogether — though of course mangling is still an option.
The delay occurs because your voice, or some data derived from it anyway, has to travel from your phone to the servers of whoever operates the service, where it is analyzed and sent back a short time later. This can take anywhere from a handful of milliseconds to multiple entire seconds (what a nightmare!), or longer if your packets get lost in the ether.
Why not just do the voice recognition on the device? There’s nothing these companies would like more, but turning voice into text on the order of milliseconds takes quite a bit of computing power. It’s not just about hearing a sound and writing a word — understanding what someone is saying word by word involves a whole lot of context about language and intention.
Your phone could do it, for sure, but it wouldn’t be much faster than sending it off to the cloud, and it would eat up your battery. But steady advancements in the field have made it plausible to do so, and Google’s latest product makes it available to anyone with a Pixel.
Google’s work on the topic, documented in a paper here, built on previous advances to create a model small and efficient enough to fit on a phone (it’s 80 megabytes, if you’re curious), but capable of hearing and transcribing speech as you say it. No need to wait until you’ve finished a sentence to think whether you meant “their” or “there” — it figures it out on the fly.
So what’s the catch? Well, it only works in Gboard, Google’s keyboard app, and it only works on Pixels, and it only works in American English. So in a way this is just kind of a stress test for the real thing.
“Given the trends in the industry, with the convergence of specialized hardware and algorithmic improvements, we are hopeful that the techniques presented here can soon be adopted in more languages and across broader domains of application,” writes Google, as if it is the trends that need to do the hard work of localization.
Making speech recognition more responsive, and to have it work offline, is a nice development. But it’s sort of funny considering hardly any of Google’s other products work offline. Are you going to dictate into a shared document while you’re offline? Write an email? Ask for a conversion between liters and cups? You’re going to need a connection for that! Of course this will also be better on slow and spotty connections, but you have to admit it’s a little ironic.