Introduction

Over the past few months, there has been a flurry of companies who have started offering Text to Speech (TTS) services. Many of them are startups, majority of them are in early staged of growth, and almost all of them use one or all of the Amazon Polly, Google Text to Speech, or similar solutions from IBM or Microsoft Azure*. In this post, I will talk about a few such services, and ponder on the all important question for Podcasters:

Are the Text to Speech (TTS) services a podcaster’s friend, or foe?

Recently, someone who runs a business of providing translation services, reached out to me. They wanted to know if at gaathastory, we were looking for translators and voiceover artists or narrators. I see the usefulness of such a service, but it does not fit well within our workflow, that is, the way we do things at gaathastory.

The discussion with this individual also gave rise to the question:

With the rise of multiple text to speech software’s Does it really makes sense. In today’s world, to have a huge pool of narrators to do voiceover work?

The short answer is yes, and more so for regional languages of India. In the times to come, maybe the situation will change, and we will have more Made in India solutions that will serve this space.

What About Podcasters?

From a podcaster’s perspective, are these text to speech solutions, a friend, or a foe? I began to ponder upon this question.

Google Aunty Is Narrating A Story

In mid- 2019, I had tinkered around with Amazon Polly and Google’s text to Speech. I found them to be quite useful tools. We even sent out a couple of test episodes to our beta listeners. The feedback from one of the listeners, a young lad of about five years old was

“Oh Google aunty is narrating the story!”

Of course, the “persona”or “avatar” which had narrated the story was same voice persona, which is used in Google Maps.

A couple of years ago, we also experimented with different accents, as well as different genders for the same story. In the end, we decided that using TTS was not something that we wanted to do. Definitely not for the market that we serve, that is, children’s bedtime stories.

The technology may evolve. The sound output may become more and more human like, but what would be missing by automating the narration is the human connect. This is is essentially at odds with everything that gaathastory stands for.

The Idea Does Have Merit !

Text to speech does have its utility, and there are multiple instances where TTS might come into play. For example, if you are an educator delivering online lectures, or you’re creating videos or creating informational campaigns. Even for podcasters, in certain use cases, it may hold a lot of value. Then there are marketing newsletters, conducting webinars, running a slideshow with VoiceOver….The possibilities are large.

Use cases for Text To Speech Programs include video voiceover, podcasts, audiobooks. Blog of Amar Vyas
Uses for Text to Speech (TTS) tools

Good idea can be badly implemented!

One of the poorest implementations of TTS tools I have seen is in a series of YouTube videos. There are several channels which suffer from this malaise. Particularly when somebody is trying to do a product demonstration, and unboxing of a new product or laptop or a phone or similar. For example I saw a video that was about, Vodafone idea, one of the largest telecom cell phone providers in India, and some of the challenges they were experiencing.

Moreover, there are a few media publications in India who use probably sub par implementations of these technologies. They call the audio versions of news stories as podcasts but it’s horribly done.

TTS in Indian languages

When we look at some of the Indian languages. Hindi language option is definitely there among the personas. For English, you get multiple voices with “Indian” accents. (edit: what exactly is an Indian accent?)

I would be really keen to see more Indian languages. There are some solutions which do offer Kannada, Marathi, Telugu, Tamil. But they are few and far between. In our country, we have almost 12 or 15 major languages. So finding a good speech, or text to speech personas for each of them could be a challenge.

TTS Options available today

Over the past months, deal aggregator sites like Appsumo, Pitchground, Dealify, Stacksocial, and others list several options for TTS. There are also several personal or small scale projects on sites like github, ProductHunt, or betalist.

In other words, we have a huge number of software as a service (SaaS) solutions in this space. They all use probably the similar set of technologies at their back end, i.e. solutions from Google, Amazon, IBM, or Microsoft. This makes me wonder:

Does the front end, really matter, when all of them use the same back end technology?

Short answer: it probably does. The services differ in their offerings in any subtle and not so subtle ways. Pricing tiers vary, and so do the features. Some of them offer limited personas in the free tier or base tier; while in other cases, the number of characters that can be converted at a time, vary. So may the total number of words that can be converted to voice in a month. Some providers offer integrations with platforms such as podcast hosts.

The pricing, customer support, User interface, service levels, and value additions created by these companies could make all the difference.

I will add here, three audio samples from play.ht and Lovo. Each audio is a TTS conversion of a blog post that I wrote this morning. As I began looking at other available TTS options, I realized that probably, it’s become a problem of plenty. Below are screenshots from Play.ht and Lovo for your quick reference.

Audio generated in Play.ht using TTS (Text to Speech)

Audio generated in Play.ht using TTS (Text to Speech)

Audio generated in Lovo using TTS (Text to Speech)

Audio generated in Lovo using TTS (Text to Speech)

What does the Text to Speech audio “sound” like?

Below audio output was generated using Lovo

Below audio was generated using Play.ht

I was keen to explore Hindi language option for TTS, and below is the audio from a random post that I created recently.

Image Optimization )Hindi)

As technology tools evolve, it is natural for the speech to text (STT) and Text to Speech (TTS) tools to become more robust. Podcasters should consider these tools as an enabler, and ally of sorts, rather than a foe. I am convinced that these tools will immense content creators including podcasters immensely.


*The likes of Tencent and Alibaba may have their own solutions, which I am not familiar with. Yandex Text to Speech is another solution that I have not tried, either myself or through one of the above mentioned service providers.

Leave a Reply

Your email address will not be published. Required fields are marked *