Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: March 18, 2024
Many Linux tools convert text to speech or audio files from the command line, improving accessibility. Some of them also come with features like multiple voice options, multiple languages, pitch adjustment, and word gap control.
In this tutorial, we’ll discuss four commands for getting speech output from command line text.
espeak is a speech synthesizer that supports various languages. It can convert standard input, text files, and texts passed as arguments to speech or WAV files. It also has pitch, word gap, and amplitude control plus many other features.
We can install espeak on a Debian machine:
$ sudo apt install espeak
We can also install it on a Red Hat Linux:
$ sudo dnf install espeak
espeak can convert text passed as argument to speech:
$ espeak "Welcome to Baeldung"
When passing more than one word to the espeak command, quotation marks are vital. Without them, the command will only read the first word.
espeak can also convert standard input to speech:
$ espeak --stdin
Welcome to Baeldung
After entering the text in the prompt, we must press Ctrl + D to exit the standard input stream. After that, espeak will give us the speech output.
When we use the -f flag, espeak will produce speech from text files:
$ espeak -f Baeldung.txt
We can also get our speech output as a WAV file using the -w flag in any of the following ways:
$ espeak "Welcome to Baeldung" -w welcome
$ espeak --stdin -w welcome
Welcome to Baeldung
$ espeak -f Baeldung.txt -w welcome
When we run any of the three commands above, we’ll create a WAV file named welcome. The said file will say “Welcome to Baeldung” when we play it.
We can vary the pitch of the espeak speech output between 0 and 99, where 99 is the highest pitch and 0 is the lowest. But to change the pitch, we need the -p flag.
The default pitch is 50, so with that in mind, we have a sense of how high or low we need to go.
Let’s raise the pitch to 70:
$ espeak -f Baeldung.txt -p 70
espeak varies its speech rate or word gap using the -g option, and it works with units of 10ms (milliseconds).
So, if we want it to leave a gap of 1 second (1000ms) between each word, we’ll pass 100 to the -g flag:
$ espeak -f Baeldung.txt -g 100
espeak works with a default amplitude (volume) of 100. But we can make it go as high as 200 and as low as 0 by passing our desired value to the -a flag.
So, let’s raise the amplitude to 150:
$ espeak -f Baeldung.txt -a 150
We can pass values above 200 to the -a flag. But it’s better to stay within the recommended limits.
say is one of the tools from the GNUStep toolset. It is a lightweight text-to-speech tool that works in two ways: converting text arguments to speech and converting text files to speech.
We can install say on a Debian machine by installing the GNUStep GUI Runtime:
$ sudo apt install gnustep-gui-runtime
say can convert text to speech by passing the text as an argument:
$ say Welcome to Baeldung
say can also convert text files to speech:
$ say -f Baeldung.txt
google_speech is a CLI text-to-speech tool based on Google Translate TTS. It is lightweight, like say, but while it can convert text to audio files, say can’t.
On the other hand, while say can produce speech from text files, google_speech can’t.
To install google_speech, we’ll run the following commands:
$ sudo apt-get install libsox-fmt-all
$ sudo apt-get install sox
$ sudo pip install sox
$ sudo pip install google_speech
The first three commands install sox and some of its dependencies. Then the last command installs google_speech. google_speech needs sox to work. Hence, the installations.
Let’s make google_speech say “Welcome to Baeldung”:
$ google_speech "Welcome to Baeldung"
The quotation marks are important. Without them, the command may throw an error.
google_speech also converts text to audio files using -o:
$ google_speech -o welcome.mp3 "Welcome to Baeldung"
When specifying an output filename for google_speech, adding one of mp3, flac, or ogg as the file extension will prevent format error.
We can make google_speech produce speech output in one of 75 languages using the -l flag. But first, we’ll get a list of all supported languages by running google_speech -l:
$ google_speech -l
usage: google_speech [-h]
[-l {af,ar,bn,bs,ca,...,pl,pt,pt-br,pt-pt,ro,ru,si,sk,sq,sr,su,sv,sw,ta,te,th,tl,tr,uk,vi,zh-cn,zh-tw}]
[-e SOX_EFFECTS [SOX_EFFECTS ...]] [-v {warning,normal,debug}] [-o OUTPUT]
speech
google_speech: error: argument -l/--lang: expected one argument
Now, let’s make google_speech say “Welcome to Baeldung” using a French accent:
$ google_speech -l fr "Welcome to Baeldung
gTTS is not as lightweight as google_speech and say. But unlike them, it only produces an audio file from the text passed to it.
We can install gTTS using pip:
$ sudo pip install gTTS
We can convert text to an mp3 file using gtts-cli:
$ gtts-cli 'Welcome to Baeldung' --output welcome.mp3
Next, we’ll convert a text file to an mp3 file:
$ gtts-cli -f Baeldung.txt --output welcome.mp3
Like google_speech, gtts-cli can has a language option, -l.
We can run gtts-cli –all to see all supported languages:
$ gtts-cli --all
af: Afrikaans
ar: Arabic
bg: Bulgarian
...truncated...
en: English
es: Spanish
...truncated...
zh-TW: Chinese (Mandarin/Taiwan)
zh: Chinese (Mandarin)
Then when we run the following command, gtts-cli will use a Spanish accent:
$ gtts-cli "Welcome to Baeldung" -l es --output welcome.mp3
While gtts-cli can also reduce speech rate, it does not have the same precision as espeak.
To make speech slower with gtts-cli, we simply add the -s flag:
$ gtts-cli -s 'Welcome to Baeldung' --output welcome.mp3
In this article, we saw four ways to convert text to speech from the command line. Of all four tools discussed, espeak is the most robust as it allows pitch, amplitude, and speech rate variation unlike the other three.
While gtts-cli does not offer direct speech output, it offers more than google_speech. But overall, say offers the fewest features.