Query Understanding and Voice Interfaces

Daniel Tunkelang
Query Understanding
2 min readMay 29, 2018

--

On the surface, using a voice interface for search doesn’t seem that different from typing text into a search box.

Converting speech to text and vice versa is a mostly solved problem.

Speech recognition, while still not perfect, has matured to the point that it is ubiquitous. Application developers can leverage the major cloud providers — namely, Amazon, Microsoft, and Google —which provide reasonably priced APIs to use their speech-to-text services.

Meanwhile, computers have been able to talk to us for decades, and the quality of synthesized speech today is far more natural than when the Software Automatic Mouth (SAM) speech synthesizer was released for personal computers in 1982. Indeed recent developments like Google Duplex have raised concerns that people will be unable to distinguish human speakers from AIs.

So, while there’s still ample opportunity to improve on both speech recognition and speech synthesis, both are good enough for everyday use.

But there’s a big gap between recognizing speech and understanding it.

At best, speech recognition reduces the problem of query understanding with voice to the problem of query understanding. But the state of query understanding is far less mature than that of speech recognition. Indeed the ability of computers to recognize speech but not understand it can be particularly frustrating searchers who don’t distinguish the two problems.

And the biggest challenges come from interface constraints.

On one hand, voice input interfaces lack autocomplete, spelling correction, or any other mechanisms to guide searchers as they construct queries. Hence, there’s a greater chance that the searcher make queries that the search engine is unable to understand.

On the other hand, today’s voice output interfaces don’t allow the searcher to easily scan a set of search results, let alone conversational mechanisms to refine queries, such as clarification dialogues and faceted search. That not only makes voice interfaces less resilient to misunderstanding, but also limits the scope of queries that they can handle. In particular, voice interfaces are not well suited to exploratory search.

We’ll get there — someday.

We’re just entering an age of voice interfaces for mainstream consumer applications. It will take some time to work out the kinks, improve query understanding, and figure out the design of conversational interfaces. But we’ll get there — at least once we start to recognize these challenges and face them head-on.

Previous: Question Answering

Next: Query Understanding and Chatbots

--

--