Google Docs is getting a big update that could soon make its voice-typing feature much more useful and popular for transcribing meetings.
The cloud word processor has offered the ability to ‘type’ hands-free with your voice for several years now (just go to Tools > Voice typing, with your mic turned on). But an update that’s coming in early February will see some enhancements to the feature, plus the option of using it in web browsers beyond Chrome.
Google says the upgrade “will help reduce transcription errors and minimize lost audio during transcription”. The current incarnation’s limitations have seen it lose ground to the best speech-to-text apps like Otter.ai, which is widely used by the TechRadar team. Microsoft’s speech recognition and accessibility tools have also taken big leaps recently in apps like Word.
But if Google Docs‘ built-in equivalent can match the accuracy of its increasingly impressive rivals, it could become a much more widely-used tool. Particularly as it’ll also work in Google Slides to display a speaker’s words in real-time.
The feature should also continue to improve thanks to another upgrade; expanded support to “most major browsers”. Google hasn’t yet said which browsers, but it’s safe to say that Safari, Firefox and Microsoft Edge could be included.
We’ll likely find out when the update starts to roll out over the next month. Google Workspace users who are subscribed to Rapid Release updates will start to see it arrive from today, but most of us will see a gradual rollout over two weeks from February 6.
Analysis: AI learns to be useful
Google hasn’t been explicit about what technology is powering its voice-typing upgrade in Google Docs, but it’s likely similar to the AI-based interface if offers to businesses for improving services like customer interactions.
AI tech has been improving rapidly in the visual space with the likes of Dall-E and Midjourney, along with chatbots like ChatGPT. Handwriting recognition has also seen been given a big boost. But speech is arguably one of the most useful areas for AI development, for both usability and accessibility. And reliable speech-to-text software is just the start.
Microsoft recently unveiled a creepy, but potentially useful, new AI tech called Vall-E that can mimic human voices based on only a three-second sample. On a similar theme, Apple recently launched its first range of audiobooks with AI-powered narrators (above).
These advances raise massive ethical questions around the potential for impersonations, which is why the tech behind both is currently locked down and unavailable to consumers. But a pandora’s box of voice-based technology has been dramatically flung open.
For now, the rapid improvements in speech-to-text technology found in the likes of Google Docs (and indeed, the best text-to-speech software) are the most useful fruits of these new AI algorithms. While that software takes our meeting notes, we’ll be grabbing the popcorn for the inevitable ethical debates about next-gen voice impersonators.