23 Sep 2020 – 5 minutes read
This summer, Google has been deploying its AI voice assistant called Duplex to call bars and restaurants in order to update their opening hours on Google Maps listings.
Here’s a call their system had with our virtual assistant for restaurants.
As far as we’re aware, this is the first naturally-occurring conversation between AI voice assistants in the wild. I have never seen anything like this before, and I’m incredibly proud that PolyAI is sharing this moment in computing history with our friends from Google.
After listening to the call, we’d say that it went pretty well! Google could have handled the two-part opening hours a bit better, but overall, the interaction was smooth. I want to share a few thoughts on what this means for the future of conversational technologies.
Human language is a Universal API
This interaction did not need to be a conversation in the way that we as humans think of it. The caller’s request was transactional and to the point. An API call or an HTTPS request between two web services would have done the job in 150ms, instead of two minutes of synthesized voice going back and forth across the telephone line. However, such APIs are not always available, or standardised. For instance, there are dozens of reservation APIs for restaurants used in the UK alone, and voice assistants will never integrate with every single one.
This kind of machine-to-machine conversational communication will become more commonplace as AI deployment accelerates. And while it may seem like an overkill from a technical standpoint, there are good reasons for these kinds of interactions to be conversational, rather than API calls.
Imagine you ask your virtual assistant — Siri or Alexa, for example — to book a hotel room. It can phone multiple hotels to check availability and book without having to integrate into each hotel’s individual booking API. And logs of the conversation can inform the user of alternative venues for their next trip.
Welcome to the Uncanny Valley
Google introduced Duplex in May 2018 as an ‘AI system for accomplishing real-world tasks over the phone’. You may have seen this video of Sundar Pichai at Google I/O using Duplex to make a restaurant reservation.
I’ll be the first to point out how incredible Google’s TTS (text-to-speech) is in the Duplex/PolyAI call. It sounds like a human, much more so than our assistant does.
According to the Uncanny Valley theory, the more human-like a robot is, the more likely people are to feel positive towards it. However, at a certain point of human-likeness, the robot becomes a bit creepy, and people are repulsed by it.
In the Google/PolyAI call, Duplex really does sound like a human – it does mention that it’s an automated service, but this is not enough. Why? Because crossing the Uncanny Valley means that a person on the call will treat the bot as though they are human.
Around the 1 minute mark, you’ll hear our voice assistant ask the caller when they’d like to come in, and Duplex speaks over it. In reality, these are machines – no-one’s getting offended. But when you listen to the call, the Duplex bot comes across as pretty rude. Because it sounds so human, it’s practically impossible to not attribute human qualities to the bot. And no one wants a rude voice assistant representing their company.
Making customers think that your voice assistant is a real human is a sure-fire way to deliver frustrating customer experiences. At PolyAI, we work hard to make sure our voice assistants sit at the peak right before the Uncanny Valley – but without crossing it. Warm and friendly enough to put callers at ease, but not real enough to cause cognitive dissonance.
The Future of Voice
Companies are wising up to the benefits of conversational AI in customer communications, and we’ll see a growing number of instances of machines communicating in human language. While this type of transactional communication may seem better suited to quick API calls, voice assistants can get the job done even when such APIs are not available.
We are super excited about the future of voice assistants. Two highly sophisticated voice assistants have now met in the wild, in a rerun of the legendary “Dr Livingstone, I presume” moment (though neither assistant realised they were speaking to one of their own). This is a sign of great things to come. Stay tuned – and get in touch with us if you want to build a great voice assistant for your brand.
What do you think of the PolyAI vs Google Duplex call? Let us know on Twitter, @poly_ai.