In September of this year, Amazon hosted a press event in the steamy Spheres at its Seattle headquarters, announcing a dizzying array of new hardware products designed to work with the voice assistant Alexa. But at the event, Amazon also debuted some new capabilities for Alexa that showcased the ways in which the company has been trying to give its voice assistant what is essentially a better memory. At one point during the presentation, Amazon executive Dave Limp whispered a command to Alexa to play a lullaby. Alexa whispered back. Creepiness achieved.
Voice-controlled virtual assistants like Alexa and the speakers they live inside are no longer a novelty; an estimated 100 million smart speakers were installed homes around the world in 2018. But this year, the companies making voice-controlled products tried to turn them into sentient gadgets. Alexa can have the computer version of a “hunch” and predict human behavior; Google Assistant can carry on a conversation without requiring you to repeatedly say the wake word. If ambient computing—the notion that computers are all around us and can sense and respond to our needs—is the vision technologists have for the future, then 2018 might just be the year that vision came into sharper focus. Not with a bang, but a whisper.
Of course, progress remains slow. Voice assistants like Alexa, Google Assistant, Apple’s Siri, and Microsoft’s Cortana still require a specificity in dialogue that makes them seem less than smart. A recent survey from research firm IDC found that 52.2 percent of people who have used a smart speaker in the past year said that their voice platform “hears me easily,” which means nearly half of the respondents have had the opposite experience.
“There’s still much work to be done,” says IDC senior research analyst Adam Wright. “These platforms are struggling to break free from the shackles of requiring users to issue static, computer-centric voice commands—despite what marketing hype would have us believe.”
There’s no doubt, though, that voice assistants are increasingly earning their keep in our kitchens (and cars, and offices, and anywhere we bring our smartphones). Alexa’s whisper feature may seem simple, but building this into a voice assistant presented challenges because whispering doesn’t usually involve the vibration of the vocal cords, according to a white paper published by Amazon engineers. Alexa had to be trained on recordings of human interactions with voice-controlled, far-field microphones in both whisper and normal phonation modes.
The point isn’t just that Alexa can whisper now; it’s that Alexa can whisper back, which flicks at that future of ambient computing. Just like your friend might also lower their voice when you start speaking quietly or conspiratorially from across the table, Alexa will mimic your tone. Alexa has also been updated to have human-like “hunches”: When you tell the virtual assistant you’re going to bed or leaving the house, Alexa will suggest you turn a light on or off or lock the door if that’s something you normally do.
In October, Google announced an update to its Assistant, which works across smartphones and Google Home devices, that was supposed to make it more conversational. For awhile now you’ve been able to ask the Assistant one question—like “How tall is Lebron James?”—and immediately ask it a follow-up question about Lebron without having to say his name again. Now, Google has extended the Assistant’s memory. Ask it a question or give a command, and it will continue to listen for 8 seconds afterwards, so you don’t have to keep saying “OK, Google.” (This is similar to Amazon’s recently released “Follow up” feature, in which you can ask Alexa, say, the weather in a particular city, and then ask about a restaurant in that same city without having to identify it again.)
Google also gave its Assistant the ability to do some chores for you—things like screen your calls on an Android phone, or (in a feature called Duplex, which rolled out this fall) hold telephone conversations with an actual human to book a table at a restaurant or an appointment at the salon.
Microsoft took steps this year to make Cortana, its virtual assistant that lives on PCs and smartphones, more conversational. Siri’s updates this year were largely around Shortcuts, which lets you group together a bunch of actions on your iPhone or iPad, and trigger them with a short spoken command. Apple also improved Siri-powered Spotlight suggestions, designed to analyze your habits over time and suggest things to do on your phone. It’s not conversational, but it’s an ambient awareness of the things you need to get done.
With each tech giant focusing on a different vision for what these voice-activated AIs should do, their various bots have fallen into predefined roles. Alexa is the world’s smartest kitchen timer, Google Assistant knows a scary amount about you, Cortana is your friend in IT who helps you troubleshoot stuff, and Siri is the executive assistant on your iPhone.
Across all of these services, voice-recognition technology has improved over time, as have the assistants’ success rates for delivering a factual answer. This is partly because of scientific advancements in AI, and partly because the iPhone’s massive reach and the growing popularity of products like Amazon Echo and Google Home have created a giant voice-controlled feedback loop. The more “smart” devices that sell, the more usage data tech companies have to improve their voice tech; the more voice-control services improve, the more compelling the gadgets become.
But virtual assistants still stumble, for better or worse. (Human-to-human interaction for the win.) Despite efforts to make these things human-sounding, they still require us, the real humans in the equation, to talk to them like robots. Basically, they sometimes fail to understand natural language despite using advanced natural language processing. “You don’t have to look very far to find user testimonials that continue to voice frustrations that their device are difficult to talk to or don’t listen to them,” Wright says.
Voice control getting good presents just as many ethical problems as it does moments of ease.
That might not matter so much when Alexa or Google Assistant misunderstands the song title you’re asking it to play, or when Siri is unable to find me the absolute most convenient gas station when I’m in a moving car (which still happens, and is frustrating). But it matters a lot when you’re using these conversational assistants in an area like, say, health. It turns out, perhaps to no one’s surprise, that their inconsistencies aren’t so cheeky when the question you’re asking is about congestive heart failure, or exercise routines for cancer survivors. In September of this year, a report published in the Journal of Medical Internet Research rang the warning bell on virtual assistants, saying that they frequently didn’t understand health-related queries and that nearly 30 percent of the answers provided by the assistants “could cause harm if acted on.”
And of course, voice control getting good presents just as many ethical problems as it does moments of ease. Virtual assistants are entering our lives just as we’re becoming more aware of the insidious data-sharing practiced by some of the world’s biggest tech companies. For years now, we’ve been actively typing our shopping queries, our future destinations, our romantic interests, our innermost thoughts, into machines. Now we’re just shouting them out loud, and the voice control systems from Amazon, Google, Apple, Microsoft, and even Facebook are hoovering up our words. Just ask the Portland, Oregon couple whose private conversation was recorded by Alexa this year.
Wright, the analyst, isn’t convinced that privacy concerns are a huge deterrent for current or potential users of voice-controlled assistants. Happy customers are willing put privacy aside for a little bit of convenience, he believes. And according to IDC’s research, privacy isn’t even the leading inhibitor to using a smart assistant; the majority of survey respondents (more than 31 percent) said they just “have no use for them.”
That won’t stop the tech companies from aggressively trying to convince you that voice assistants are actually useful, something we’re likely to hear even more of in 2019. And here’s the thing: when these things do become more useful, we probably won’t notice it happening. Instead, the tech will just evolve around us. Sometime in 2019 you might place a call to a friend only to hear it answered by a virtual assistant, rather than your fellow human. Or, you might use that same assistant (Google’s) to make a reservation for you, under the guise of human-to-human interaction. We saw glimpses of this in 2018, and now it’s coming to fruition. You might start a conversation with your virtual assistant, then take a long, extremely human pause, and resume the conversation without any glitches.
Later, that same assistant will remind you to lock the door before you go to bed. And when it reads to you a bedtime story—maybe a science fiction book about robots taking over the world—it might know to lower its voice as you start to fall asleep.