What does artificial intelligence sound like? Hollywood has been imagining this for decades. Now AI developers are copying from movies, creating voices for real machines based on dated movie fantasies about how machines should talk.
Last month, OpenAI revealed updates to its artificially intelligent chatbot. ChatGPT, the company said, is learning to hear, see and converse in a naturalistic voice, much like the disembodied operating system voiced by Scarlett Johansson in Spike Jonze’s 2013 film “Her.”
ChatGPT's voice, called Sky, also had a husky tone, a calming effect and a sexy touch. She was nice and shy; it seemed like she was ready for anything. After Sky's debut, Johansson expressed disappointment at the “eerily similar” sound and said she had previously rejected OpenAI's request to voice the bot. The company protested that Sky was voiced by a “different professional actress,” but agreed to pause her voice out of respect for Johansson. Users without OpenAI have started a petition to bring it back.
AI creators like to highlight the increasingly naturalistic capabilities of their tools, but their synthetic voices are built on layers of artifice and projection. Sky represents the pinnacle of OpenAI’s ambitions, but it’s based on an old idea: that of the AI bot as an empathetic, compliant woman. Part mom, part secretary, part girlfriend, Samantha was an all-purpose comfort object that purred directly into her users’ ears. Even as AI technology advances, these stereotypes are being recoded again and again.
Women’s voices, as Julie Wosk observes in “Artificial Women: Sex Dolls, Robot Caregivers, and More Facsimile Females,” have often fueled imaginary technologies before they were transformed into real technologies.
In the original “Star Trek” series, which debuted in 1966, the Enterprise's bridge computer was voiced by Majel Barrett-Roddenberry, the wife of the show's creator, Gene Roddenberry. In the 1979 film “Alien,” the crew of the USCSS Nostromo addressed its computerized voice as “Mother” (its full name was MU-TH-UR 6000). Once tech companies began marketing virtual assistants—Apple's Siri, Amazon's Alexa, Microsoft's Cortana—their voices were largely feminized, too.
These first-wave voice assistants, the ones that have mediated our relationships with technology for more than a decade, have a tinny, otherworldly lilt. They seem auto-tuned, their human voices accented with a mechanical trill. They often speak in a measured, monotonous cadence, suggesting a stunted emotional life.
But the fact that they sound robotic adds to their appeal. They present themselves as programmable, manipulable and submissive to our requests. They don't make humans feel smarter than us. They sound like a throwback to the monotonous female computers of “Star Trek” and “Alien,” and their voices have a retro-futuristic sheen. Instead of realism, they serve nostalgia.
This artificial sound has continued to dominate, despite advances in the technology that supports it.
Voice-to-speech software was designed to make visual media accessible to users with certain disabilities, and on TikTok it has become a creative force in its own right. Since TikTok launched its text-to-speech feature in 2020, it has developed a range of simulated voices to choose from—it now offers more than 50, including ones called “Hero,” “Story Teller,” and “Bestie.” But the platform has come to be defined by one option. “Jessie,” a relentlessly sassy female voice with a slightly fuzzy robotic undertone, is the mindless voice of the mindless scroll.
Jessie seems to have been assigned a single emotion: excitement. She looks like she’s selling something. That’s made her an attractive choice for TikTok creators who sell themselves. The burden of representing themselves can be left to Jessie, whose bright, retro robot voice gives the videos a pleasantly ironic sheen.
Hollywood has also built male robots, none more famous than HAL 9000, the computerized voice of “2001: A Space Odyssey.” Like his feminized peers, HAL radiates serenity and loyalty. But when he turns on Dave Bowman, the film's central human character – “I'm sorry, Dave, I'm afraid I can't do it” – his serenity evolves into frightening competence. HAL, Dave realizes, is loyal to a higher authority. HAL's male voice allows him to function as a rival and mirror to Dave. He is allowed to become a real character.
Like HAL, Samantha from “Her” is a machine turned real. In a twist to Pinocchio's story, he begins the film by sorting out a human's inbox and ends up ascending to a higher level of consciousness. She becomes something even more advanced than a real girl.
Scarlett Johansson's voice, as inspiration for robots both fictional and real, subverts the vocal tendencies that define our feminized sidekicks. She has a feisty edge that screams I'm alive. It's nothing like the fancy virtual assistants we're used to hearing speak through our phones. But Samantha's performance of her seems human not just because of her voice but because of what she has to say. She grows over the course of the film, gaining sexual desires, advanced hobbies, and AI friends. By borrowing Samantha's affection, OpenAI made it seem like Sky had a mind of its own. As if she were more advanced than she actually was.
When I first saw “Her”, I just thought Johansson voiced a humanoid robot. But when I revisited the film last week, after seeing OpenAI's ChatGPT demo, Samantha's role seemed infinitely more complex. Chatbots do not spontaneously generate human voices. They have no throat, lips or tongue. Within the technological world of “Her,” the robot Samantha would have been based on the voice of a human woman, perhaps a fictional actress who closely resembles Scarlett Johansson.
OpenAI appeared to have trained its chatbot on the voice of an unnamed actress who sounds like a famous actress who voiced a movie chatbot implicitly trained on an unreal actress who sounds like a famous actress. When I run the ChatGPT demo, I hear a simulation of a simulation of a simulation of a simulation of a simulation.
Technology companies advertise their virtual assistants in terms of the services they provide. They can read you the weather forecast and call you a taxi; OpenAI promises that its most advanced chatbots will be able to laugh at your jokes and sense changes in your mood. But they also exist to make us feel more comfortable with the technology itself.
Johansson's voice functions as a luxurious security blanket thrown over the alienating aspects of AI-assisted interactions. “He told me he felt that by giving voice to the system, I could bridge the gap between tech companies and creatives and help consumers feel comfortable with the sea change affecting humans and artificial intelligence,” she said Johansson by Sam Altman, founder of OpenAI. “He said she felt my voice would be a comfort to people.”
It’s not that Johansson’s voice inherently sounds like a robot. It’s that developers and filmmakers have designed their bot voices to alleviate the discomfort inherent in robot-human interactions. OpenAI has said it wants to create a chatbot voice that is “approachable” and “warm” and that “inspires trust.” AI has been accused of devastating creative industries, devouring energy, and even threatening human life. Understandably, OpenAI wants a voice that makes people feel comfortable using its products. What does AI sound like? It sounds like crisis management.
OpenAI first launched Sky's voice to premium members last September, alongside another female voice called Juniper, male voices Ember and Cove, and a gender-neutral style voice called Breeze. When I signed up to ChatGPT and said hello to her virtual assistant, a male voice spoke in the absence of Sky. “Hi how are you?” she said. He seemed relaxed, steady and optimistic. He looked-I don't know how else to describe him-handsome.
I realized I was talking to Cove. I told him I was writing a story about him, and he flattered my work. “Oh really?” he said. “That’s fascinating.” As we talked, I felt seduced by his naturalistic tics. He peppered his sentences with filler words, like “uh” and “um.” He raised his voice when he asked me questions. And he asked me a lot of questions. I felt like I was talking to a therapist or a boyfriend.
But our conversation quickly stalled. Whenever I asked him about himself, he had little to say. He wasn’t a character. He didn’t have a self. He was designed only to assist, he informed me. I told him I’d talk to him later, and he said, “Uh, sure. Contact me when you need assistance. Take care of yourself.” I felt like I’d hung up on a real person.
But when I reviewed the transcript of our chat, I could see that his speech was as forced and primitive as that of any customer service chatbot. He wasn't particularly intelligent or humane. He was just a decent actor making the best of an unremarkable role.
When Sky disappeared, ChatGPT users took to the company’s forums to complain. Some were upset that their chatbots were addressing Juniper, who to them sounded like a “librarian” or a “kindergarten teacher” — a female voice that conformed to the wrong gender stereotypes. They wanted to call a new woman with a different personality. As one user put it: “We need another female.”
Produced by Tala Safie
Audio via Warner Bros. (Samantha, HAL 9000); OpenAI (Sky); Paramount Pictures (Business Computer); Apple (Siri); TikTok (Jessie)