Speech on Windows Phone, Siri on the iPhone, and Voice Search on Android. Photo: Ariel Zambelich / Wired
My daughter, who is not quite two, loves to talk to Siri. She will grab my phone, press the home button until it beeps, and then holler into the phone, “Hi Siri! Hi Siri! Hi Siri!” Siri never understands. Ever. And yet, my daughter keeps on trying. This seems like a metaphor.
I was an early and vocal Siri critic. When the feature shipped, it wasn’t even close to ready for prime time, and its wink-wink, nudge-nudge beta label only made Apple’s decision to base an entire ad campaign around it all the more galling. And so, a year ago, I argued that Siri wasn’t ready, that it was “often quite dumb. Sure, it will do what you tell it. But it doesn’t interpret or do nuance, even though that is exactly what Apple promises.”
It’s gotten much, much better. Siri can now do many of the things that it formerly could not, like launch a third-party app. And it had better be more adept, because there’s some serious competition now. Just like Siri, though, the competition has flaws. Windows Phone 8 has a Siri-like feature called Speech. Sadly, it’s even further behind the usability event horizon than Siri was when it launched. But Google’s new Voice Search that rolled out with Jelly Bean shows more promise. It’s amazing on an Android handset (and thanks to an iOS app it even guns hard on iPhones). But it is utterly devoid of personality and usability. It is, frankly, very Google.
All of these services are kind of awesome yet tend to be frustrating overall, though in different ways. Bark a command and they may or may not perform it. Ask for directions and you may or may not end up where you want to go. Try to play some music or send a text and you’ll quickly find that they don’t do proper names very well. It’s quickly apparent that they are totally data-dependent. Give them enough of it, and each can amaze. Too little, and they’re just plain dumb. The bottom line is that, despite improvements, voice-activated assistants are still not ready for prime time.
At times, all seem, well, stupid. Wired is at 520 Third St. in San Francisco. I’m not sure about you, but I tend to pronounce this as “five-twenty.” But ask Siri, or Windows Phone 8, or Android’s Voice Search to navigate you to “five-twenty Third Street in San Francisco” and every single one will try to find 523rd Street — which doesn’t exist because San Francisco does not have the geographic footprint of a mid-size European nation. Trying “five hundred and twenty, third street” fares no better. The only way I’ve found to reliably get any of these smartiebots to get me to work is to cheat and ask for “five-hundred twenty-two Third Street, in San Francisco.” It would have been so much easier just to type it in.
When the iPhone 4S launched, Apple had a chicken-and-egg problem (and, in fairness, it still does). It needed to get millions and millions of people using Siri for it to work well, but people weren’t going to flock to it if it didn’t work. And so it began pushing it, hard. You’ve seen the commercials, where Zooey Deschanel flutters around her apartment in her pajamas asking about the weather, or Samuel Jackson pretends to cook. They promote a Siri that is a lot more elegant on TV than she is in practice.
Yes, it can send a tweet or an e-mail for me now. It can finally launch apps, and show Yelp ratings of nearby restaurants. But it still has basic, fundamental problems. Siri still has trouble understanding the things I say. It has persistent trouble with names. It often times out, even when my phone has a strong network connection. In short, it still chokes too often when it should sing. Tellingly, it was frequently cited as a factor in iOS chief Scott Forstall’s recent and unceremonious exit from Apple.
But if Siri is bad (and it is), Speech on Windows Phone 8 is far worse. It can tell you the weather pretty reliably, and if you want to search the web, it’s just fine. It even does proper names quite well, certainly as well as Siri in my experience. In other areas, however, Speech really sputters. When I ask it to “play Skrillex,” for example, it tries to send a text message to Alexis Madrigal. When I asked Speech to “play Simon and Garfunkel,” it instead searched the web for “police women in cars,” an entertaining result, although I really just wanted to hear “At the Zoo.”) Or let’s say I want to drive to San Diego — which, unlike looking up the weather, is a pretty vital hands-and-eyes-free task. If I ask my HTC 8x, one of the Windows Phone 8 flagship phones, to “navigate to San Diego” it will launch the AT&T Navigator app and … wait for me to input where I want to go. Or let’s say I ask it “how do I get to San Diego.” It launches a web search where the top result is … a Booking.com ad for hotels in San Diego. I try asking it directly, “can you tell me how to get to San Diego?” It launches the XBox Music app, but nothing plays. Again and again, Speech tends to have more questions than answers. Siri may often be wrong, but at least it’s opinionated.
Navigation is an especially odd outlier for Windows Phone 8, because it’s something that the other platforms do quite well. Siri and Google’s Voice Search both complete all three of those requests by launching into directions from my current location to where I want to go.
Google’s Android Voice Search, on the other hand, is the epitome of cold, machine-like reliability. There’s a line in the original Siri commercial where a man says, “play some Cold Train.” Or at least, that’s what I’ve been told he says. Initially, I interpreted it as “play some Coltrane,” and that’s what it still sounds like to me. Coincidentally, I love John Coltrane. And so it’s always bothered me that Siri, well over a year later, can’t play it for me. But it turns out Siri is not alone in this. Windows Phone Speech can’t pull this trick off either. Windows Phone does, at least, tend to recognize that I’m saying John Coltrane. But it doesn’t have the sophistication to launch into it and actually play a track.
In fairness, this is hard stuff. My little audio digital assistant has to be able to suss out subtle distinctions to figure out if I am saying “Cold Train,” or “coal train,” or “Cole Train,” or “Coletrane.” That phonetic interpretation is a hell of a lot harder than Simon & Garfunkel. But Android’s Voice Search just nails this. At my desk the other day, I tried the Coltrane test on a Nexus 4. To my utter delight I heard, almost immediately, “Blue Train” start up.
In its first iteration, Google is really, really good at this. That because, as with mapping and search data, it’s got a huge head start on Apple and Microsoft. Google is all about crunching data. And when it comes to voice, that matters. A lot.
Thanks to everything from Google 411 (its information service) to Google Voice voicemail transcriptions to its web-based voice search, Google has been deep in the voice recognition business for years now, crunching data in massive sets. Likewise, it’s been in the search business for its entire existence and has a lot of experience determining not only what words people are using, but what we actually mean. And it’s been at it on phones for a comparably long time, too. Although Voice Search is new, to Android users Siri’s newfound navigation ability was simply yet another feature they’ve had for years.
Ultimately, however, no matter which platform you’re on, Google’s ability to quickly and seemingly effortlessly play John Coltrane should be tremendously encouraging. Because what Google has now, Microsoft and Apple will, given enough data. If you want to see the future of how awesome Siri or Speech could be one day, take a look at what Google has right now. Its Voice Search is, unquestionably, better as a smart digital assistant than Apple’s Siri or Microsoft’s Search. It just plain works, and it works right now. It doesn’t need to open the doors to multi-million user beta testing. It’s already coldly, massively efficient. Robotic, even.
As Android’s product manager told Wired’s Nathan Olivarez-Giles back in June:
“It’s very deliberately not making jokes with you. Google is a neutral party — it’s not your friend, secretary or sister. It’s not your mom. It’s not your girlfriend or boyfriend. It is an information-retrieval entity. You ask, we respond. And it’s very important that this entity be impartial, and adding jokes and other mannerisms to the voice would take away from that.”
Which brings me back to my kid. She keeps striking out with Siri, again and again and again. But maybe once, or maybe twice, Siri heard her correctly, and replied, even if just to say hello. And because of that, she loves Siri. Loves it. And so she tries, again and again and again. For everything else Siri lacks, it has far more personality than offerings from Google or Microsoft. And thanks to that, we’re probably more likely to forgive Siri’s faults, and keep trying too. And the more we try, the better Siri gets.
Yes, we want our personal digital assistants to answer questions for us. We need them to be accurate. But we want them to be personal. We want them to call us “dude” and understand our colloquial slang. We want them to understand us when we speak naturally, yes, but we also want them to answer us in kind. Personality sells. Apple is very good at one, and Google the other. (Microsoft? Well it has a lot of catching up to do.) But so far nobody is pulling off both.