Text to speech technologies replacing humans?

When I read the title for this article, my heart nearly skipped a beat. The writer asked the question: will computers ever replace human voice over actors?

Here are my succinct thoughts on the subject: I’ve been in this business for almost 20 years now, and I’ve seen all kinds of trends come and go (low voice, high pitched, mature, teenager-y, great voice, no voice at all, hard sell, documentary style, etc…) but for the past 7-8 years, it’s been all about conversational, conversational, CONVERSATIONAL voice overs, -pretty please. Even voicemails mention that they want a conversational read. Here is an excerpt from a national retailer’s IVR audition I received this week:

The persona should have the following attributes:

  • In their early-to-mid 30s
  • Average to slightly low in voice pitch, with a good range of intonation
  • Relaxed and slightly casual in speaking style
  • Well-educated and tech-savvy, without sounding overly technical or pompous
  • Personality traits: helpful, genuine, enthusiastic, informative, insightful, reliable, helpful, clear, straightforward, empathetic, polite.

The (company store) persona should convey the role of an experienced call center agent who has been on the job for several years and has worked across different skill groups during that time. She enjoys getting the opportunity to help people solve their problems, but doesn’t take herself too seriously. The customers she talks to find her friendly and approachable, and particularly appreciate her willingness to explain any complicated details they might be unclear about.

Here is another audition:

We are casting for a female voice talent between the age of 25-45 years of age.  When you read the script you will see that it is about love, poison and camouflage in the animal world.  The talent need to be able to weave through these different feelings in the script. This should be done in a friendly intellectual yet intriguing way.  We are celebrating the wonder of Science.  Make the viewer feel they are a part of this.

So…Do you really think a computer can meet these criteria? NO way Jose…Us humans have a hard time doing this (I mean…what do they want from me??). And just today, the various auditions I received, mentioned:

-Laid back, almost soft-spoken. The delivery should feel totally natural and conversational. -Nothing sell-y, big, or over-the-top.
-Improv, have a sense of humor/ enjoy it, pretend it’s a story, not ad copy! We are searching for a female VO talent with a sense of humor.

And the conversational trend isn’t going anywhere fast. Why?

I don’t have a phD in consumer trends, but I believe it’s because the generation that came just after mine can’t digest “fake, anything,” anymore. They want real, real, real. Perhaps it’s because they’ve been continually exposed to a global “real”, 24/7, on youtube for years now. Perhaps Reality TV for all it’s pitfalls, contributed to making REAL, a deal. Whatever it is, even films are more real now: this week I saw a film where the protagonist and all her friends, including her love interest were in their 70’s! This was a Sundance favorite! Today it was announced that Jane Fonda and Lily Tomlin have a new series, available on Netflix, -they are also in their 70’s. And this won’t be stopping soon: a friend of mine just sold a screenplay for LOTS of money, and all the characters were seniors…we would not have imagined this even last year…

And look at how we spend our money: we have full blown sub-economies now with Air B&B & Uber. While you may think I’ve lost my train of thought here, I can’t help but create a link between that new reality, and us wanting “real” in our lives. Younger people especially, do not want to be force fed “the conventional.” They are much more perceptive of the deceit that often hides behind it. They want direct contact as a result of it: at this point they would rather stay in a strangers bed, than in a “sterile” hotel, because that hotel, doesn’t have a face, -it doesn’t have a “soul”, and as a result, they are convinced it doesn’t care about them, and it let’s face it, it often doesn’t. A person’s house on the other hand, has a soul, and the host is forced to care about them, if only because of reviews.

Bottom line? The world wants REAL…so let’s not worry about TTS taking over just yet.

I personally think it’s one thing to have A.I. or to mimick humans, but it’s a whole other thing to possess something mysterious and magical, we don’t quite understand, that we commonly refer to as “the soul”… and let’s face it, that’s what all the buyers want. They want to feel our “soul” in the voice over. So let’s not worry about that smart machine thing taking our jobs ?


PS. Here’s me doing the TTS thing. 10,000 sentences later…this is what it sounds like!