Human-voiced smart assistants such as Siri, Alexa, Google Assistant, Cortana, and the like are proliferating. Before long we’ll be talking with just about anything that has an electric switch, and all these things will be replying back to us in a human voice. These systems are increasingly built to sound like us, with our um’s and ah’s. And that’s not a good thing, explains technologist David Weinberger.
Pretty soon everything will have a voice. Your phone already has one, and maybe your smart speaker. Your car. Your TV’s remote control. Soon your toaster. Those voices are likely to be both highly reliable and based on a lie.
At the moment, for many of us our most common experience of interacting with a computer that speaks in a human-sounding voice happens over the telephone with a scammer trying to get us to give to a bogus charity or pay for help with a Windows problem we don’t have. But in terms of public awareness, we increasingly associate human-voiced computer apps with digital assistants such as Siri, Alexa, Google Assistant, Cortana, and the like. These user interfaces of these assistants are more convenient than buttons and keys: you can interact with them while both your hands are holding your child, get their responses without having to stop monitoring the egg you’re frying, and not have to worry about typos at all. And now the next wave is on its way: voice interfaces are the perfect solution, for example, for the Internet of Things — all those connected home devices and appliances that would otherwise each have their own interfaces and confusing displays. Before long we’ll be talking with just about anything that has an electric switch, and all those things will be replying in a human voice.
All of these systems — especially the assistants — have an incentive to tell us the truth. After all, when we step outside we’ll find out if it’s not really sunny as our digital assistant promised. When we get home and take our dinner from the oven, we’ll find out if its assurances that it pre-heated our oven were baseless. We may even discover that Kevin Bacon is not taller than Tom Hanks, despite what our lovely-voiced assistant told us. In these simple, practical cases, if we find out that these assistants are not telling us the truth, we’ll simply stop using them.
But they also all have an incentive to tell us one big lie every time they talk: that they are like us. That’s why Google’s Duplex AI Assistant says “um” sometimes. Duplex is the software that will, for example, call up a restaurant and make a reservation for you without the person on the other end ever finding out they were talking with an AI. When Duplex um’s, it isn’t really at a loss for words. It’s just trying to trick the other person, just like the voice on a scammy phone call. That might work well in the moment — the receptionist won’t be disconcerted by the idea of talking to a machine instead of a fellow human. But it can also erode one of our bedrock ways of assessing trust: how someone sounds.
Now, Google Duplex is a special and somewhat weird case. It also may be a temporary solution: once restaurants and other venues get their own version of Duplex to take reservations, we can assume the AIs on both sides of the call will drop the pretense and complete the transaction in highly efficient robot beeps and boops rather than trying to out-um the other.
But in the meantime, all of these assistants sound like humans because their makers want our trust. They know we’re wired to attach ourselves emotionally to human voices. That’s also why these assistants tend to default to women’s voices: we humans, at least in the West, apparently find those voices more reliable. If Apple could get Siri to nuzzle us with its cold nose and ask to be petted, it probably would. But that wouldn’t make these assistants any more trustworthy.
Sounding like a human, and like a female of the species, increases our trust, but not because the systems are any more trustworthy. If sounding like a talented frog happened to play into our biological preferences, then Siri would be voiced by Kermit. Unearned trust should not be trusted.
Furthermore, all of this humanizing isn’t aligned with our actual interests. While Alexa likes to think I want to be relaxing with her in a rocking chair on my front porch and sharing a glass of her famous sweet tea, that’s not actually the most efficient way for a machine to communicate information to us. I’d like to be able to tell Alexa to speak faster and to skip the pleasantries. In fact, a device that talks in a flat, speedy voice designed for nothing but the efficient transfer of information may be better at signaling that its maker’s interests are aligned with ours: we just want to know that the stove is going to be at 425F for ten minutes and then will drop to 350F for thirty minutes. We don’t need the stove to pretend it cares about us. A mechanical voice may actually elicit more trust than all those um’s, the same way a human speaking plainly, without flattery or chitchat, can be. At least for some of us.
It may be too much to ask that our devices not try to speak like humans right out of the box. But as the things around us start to compete for our attention and our trust by talking in home-y but phony human voices as in some dystopian Disney movie, companies may find that giving us the option to command our digital assistants to speak like the soulless machines they are makes good business sense.
Powered by WPeMatico