Text To Speech Fun

by IoFAdmin at

python | programming

We're Going To Create A Program That Reads A Book To Us!

In this tutorial, we'll use PIL (Python Imaging Library), gtts (Google's text to speech library), and TkInter to create a suite of programs that will read a simple book to us. Let's get started!

The Jabberwocky AKA I Didn't Want to Write A Story Myself

Since it's in the public domain and pretty well-known, we'll be using the poem "The Jabberwocky" by Lewis Carroll as our source material. The whole text can be found on Wikipedia.

Not Exactly Van Gogh But It'll Do

The first step in our self-reading book is to create some images. I couldn't find many images of the Jabberwocky online so I decided to create the images using text. Our pictures will be sort of like the "Word Of the Day" calendars that many people have on their desks except ours will feature a word from each stanza of the poem.

No We Don't Actually Have The Ghost of Stephen Hawking

Step number two will be to send the text of our poem to gtts and have it create a series of mp3 files that Stephen Hawking... I mean our computer can read to us.

Finally, We'll Have Storytime With Our Computer

After we generate our images and mp3s, we'll create a TkInter GUI program that will play the mp3s along with displaying the corresponding images.

One Thing To Note

You should run the Python scripts in the order that they're listed here. If you don't, the book reading script won't have the files that it needs to actually work.

Shut Up Already And Bring On The Code!

Like all of our tutorials, we'll see the code and then explain how it works.

Image Creation Code

Create a file named text2img.py and paste the following code into it:

from tkinter import font
from PIL import Image, ImageDraw, ImageFont
import os

def generateTextImage(word, idx):

    posX = 0
    posY = 10
    pageNumX = 385
    pageNumY = 250
    offset = 25
    fontSize = 20
    textColor = (255, 255, 255)
    backgroundColor = (0, 0, 0)
    
    img = Image.new('RGB', (500, 300), color = backgroundColor)
    curFont = ImageFont.truetype('/Library/Fonts/Arial.ttf', fontSize)
    d = ImageDraw.Draw(img)

    lines = word.split('|')

    for line in lines:
        posX += offset
        posY += offset

        d.text((posX,posY), line, font=curFont, fill=textColor)

    d.text((pageNumX, pageNumY), f'page {idx + 1}', font=curFont, fill=textColor)
    
    img.save(f'./img/word_{idx}.png')

if __name__ == '__main__':
    words = [
        'brillig: four o\'clock in the afternoon, the time|when you begin broiling things for dinner',
        'bandersnatch: a swift moving creature with|snapping jaws, capable of extending its neck',
        'manxome: fearsome; a portmanteau|of manly and buxom',
        'burbled: a mixture of the three verbs|bleat, murmur, and warble',
        'galumphing: to move with a clumsy and heavy tread',
        'chortled: combination of chuckle and snort',
        'wabe: the grass plot around a sundial'
    ]

    os.mkdir('./img')

    for idx, val in enumerate(words, start=1):
        generateTextImage(val, idx)

    generateTextImage('Jabberwocky|by Lewis Carroll', 0)

There might be a lot going on here but it's relatively simple. First, we create a list of strings containing our words and definitions:

words = [
        'brillig: four o\'clock in the afternoon, the time|when you begin broiling things for dinner',
        'bandersnatch: a swift moving creature with|snapping jaws, capable of extending its neck',
        'manxome: fearsome; a portmanteau|of manly and buxom',
        'burbled: a mixture of the three verbs|bleat, murmur, and warble',
        'galumphing: to move with a clumsy and heavy tread',
        'chortled: combination of chuckle and snort',
        'wabe: the grass plot around a sundial'
    ]

Then we create a directory named img where we'll save our images.

os.mkdir('./img')

Next, we loop over our list and send the data to our generateTextImage function.

for idx, val in enumerate(words, start=1):
        generateTextImage(val, idx)

Finally, we call our function one more time (to account for an image with no corresponding mp3).

generateTextImage('Jabberwocky|by Lewis Carroll', 0)

Let's actually create the image using the generateTextImage function. Here we define the variables we need to generate our image. posX and posY set the starting position of the main text in the image. pageNumX and pageNumY set the starting position of the page number text. The other variables are self-explanatory.

posX = 0
posY = 10
pageNumX = 385
pageNumY = 250
offset = 25
fontSize = 20
textColor = (255, 255, 255)
backgroundColor = (0, 0, 0)

We create an image 500 by 300 pixels and use the provided font size and font face. (The path to the font used here is for Mac so you'll have to adjust if you're on a different operating system.) Finally, we create an image variable using our values.

img = Image.new('RGB', (500, 300), color = backgroundColor)
curFont = ImageFont.truetype('/Library/Fonts/Arial.ttf', fontSize)
d = ImageDraw.Draw(img)

We split our word variable into a list of lines on the pipe character. For each line, we increase the offset so that we have a hanging indent effect. Then we add the text to our image.

lines = word.split('|')

for line in lines:
    posX += offset
    posY += offset

    d.text((posX,posY), line, font=curFont, fill=textColor)

We add our current page number text to the image.

d.text((pageNumX, pageNumY), f'page {idx + 1}', font=curFont, fill=textColor)

Finally, write the image to a file and save it.

img.save(f'./img/word_{idx}.png')

Mp3 Creation Code

Now we'll create the mp3 files. Paste the following code into a new file named text2mp3s.py

from gtts import gTTS
import os

def generateMp3(words, idx):
    myobj = gTTS(text=words, lang='en', slow=False)
    myobj.save(f'./mp3/jabberwocky_{idx}.mp3')

if __name__ == '__main__':
    jabberwocky = [
        'Jabberwocky by Lewis Carroll',
        'Twas brillig, and the slithy toves Did gyre and gimble in the wabe: All mimsy were the borogoves, And the mome raths outgrabe.',
        'Beware the Jabberwock, my son! The jaws that bite, the claws that catch! Beware the Jubjub bird, and shun The frumious Bandersnatch!',
        'He took his vorpal sword in hand; Long time the manxome foe he sought— So rested he by the Tumtum tree And stood awhile in thought.',
        'And, as in uffish thought he stood, The Jabberwock, with eyes of flame, Came whiffling through the tulgey wood, And burbled as it came!',
        'One, two! One, two! And through and through The vorpal blade went snicker-snack! He left it dead, and with its head He went galumphing back.',
        'And hast thou slain the Jabberwock? Come to my arms, my beamish boy! O frabjous day! Callooh! Callay! He chortled in his joy.',
        'Twas brillig, and the slithy toves Did gyre and gimble in the wabe: All mimsy were the borogoves, And the mome raths outgrabe.'
    ]

    os.mkdir('./mp3')

    for idx, val in enumerate(jabberwocky):
        generateMp3(val, idx)

Through the power of Google and maybe Stephen Hawking... Who knows? Maybe Google's channeling him from the Great Beyond?

We create a list of poetry lines.

jabberwocky = [
        'Jabberwocky by Lewis Carroll',
        'Twas brillig, and the slithy toves Did gyre and gimble in the wabe: All mimsy were the borogoves, And the mome raths outgrabe.',
        'Beware the Jabberwock, my son! The jaws that bite, the claws that catch! Beware the Jubjub bird, and shun The frumious Bandersnatch!',
        'He took his vorpal sword in hand; Long time the manxome foe he sought— So rested he by the Tumtum tree And stood awhile in thought.',
        'And, as in uffish thought he stood, The Jabberwock, with eyes of flame, Came whiffling through the tulgey wood, And burbled as it came!',
        'One, two! One, two! And through and through The vorpal blade went snicker-snack! He left it dead, and with its head He went galumphing back.',
        'And hast thou slain the Jabberwock? Come to my arms, my beamish boy! O frabjous day! Callooh! Callay! He chortled in his joy.',
        'Twas brillig, and the slithy toves Did gyre and gimble in the wabe: All mimsy were the borogoves, And the mome raths outgrabe.'
    ]

Create a directory to save our mp3 files in...

os.mkdir('./mp3')

Loop over our list and call the generateMp3 function for each one.

for idx, val in enumerate(jabberwocky):
        generateMp3(val, idx)

Send our text string to our text to speech library and save the generated mp3 files.

def generateMp3(words, idx):
    myobj = gTTS(text=words, lang='en', slow=False)
    myobj.save(f'./mp3/jabberwocky_{idx}.mp3')

Book Reading Creation Code

Last one, I promise... Paste the following code into jabberwocky.py

from tkinter import *
from PIL import Image, ImageTk
import os
import threading

def playMp3(idx):
    # only works for mac
    os.system(f'afplay ./mp3/jabberwocky_{idx}.mp3')

    # for linux
    #os.system(f'mpg321 ./mp3/jabberwocky_{idx}.mp3')

root = Tk()
root.title("Jabberwocky")
root.resizable(0, 0)

frame=Frame(root, width=600, height=500, bg='white', relief=GROOVE, bd=2)
frame.pack(padx=10, pady=10)

images = []

startImg = Image.open('./img/word_0.png')
startImg.thumbnail((500, 300))
start = ImageTk.PhotoImage(startImg)

th = threading.Thread(target=playMp3, args=(0,))
th.start()

for idx in range(0, 8):
    img = Image.open(f'./img/word_{idx}.png')
    img.thumbnail((500, 300))
    images.append(
        ImageTk.PhotoImage(img)
    )

i = 0
image_label = Label(frame, image=start)
image_label.pack()

def previous():
    global i
    i = i - 1

    if i < 0:
        i = 7

    image_label.config(image=images[i])

    th = threading.Thread(target=playMp3, args=(i,))
    th.start()

def next():
    global i
    i = i + 1

    if i > 7:
        i = 0

    image_label.config(image=images[i])

    th = threading.Thread(target=playMp3, args=(i,))
    th.start()

btn1 = Button(root, text="< Back", highlightbackground='black', fg='gold', font=('ariel 15 bold'), relief=GROOVE, command=previous)
btn1.pack(side=LEFT, padx=60, pady=5)

btn2 = Button(root, text="Next >", width=8, highlightbackground='black', fg='gold', font=('ariel 15 bold'), relief=GROOVE, command=next)
btn2.pack(side=LEFT, padx=60, pady=5)

btn3 = Button(root, text="Exit", width=8, highlightbackground='black', fg='gold', font=('ariel 15 bold'), relief=GROOVE, command=root.destroy)
btn3.pack(side=LEFT, padx=60, pady=5)

root.mainloop()

Now we'll create our program that actually "reads" our book. DISCLAIMER: this is the first time that I've ever used TkInter so I'm by no means an expert. I urge you to find some tutorials on it so that you can learn more.

Create the root level of our GUI layout, set the title, and add a frame to contain all of our GUI elements.

root = Tk()
root.title("Jabberwocky")
root.resizable(0, 0)

frame=Frame(root, width=600, height=500, bg='white', relief=GROOVE, bd=2)
frame.pack(padx=10, pady=10)

We open our "title card" image, set it's size, and save it to a variable named start.

startImg = Image.open('./img/word_0.png')
startImg.thumbnail((500, 300))
start = ImageTk.PhotoImage(startImg)

Next, we create a new Python thread, assign the playMp3 function to it, and include "0" as a parameter to it. Then we start the thread. If you've never used Python threads, you should look them up on the official documentation. Basically, a thread is a separate Python process that allows you to run some code that doesn't slow down the main process. In this example, we're playing our mp3 file in a thread while our main process handles updating the GUI. If we don't use a thread, playing the mp3 will make the GUI hang until it finishes.

th = threading.Thread(target=playMp3, args=(0,))
th.start()

For each of our images, we create a thumbnail and add it to our images list.

for idx in range(0, 8):
    img = Image.open(f'./img/word_{idx}.png')
    img.thumbnail((500, 300))
    images.append(
        ImageTk.PhotoImage(img)
    )

We take our "start" image and display it.

image_label = Label(frame, image=start)
image_label.pack()

Now we define our buttons and display them. The command sets a callback to another function when the button is pressed. For example, when btn1 is pressed, the previous function is called. The last line is required to keep TkInter continuously waiting on user input.

btn1 = Button(root, text="< Back", highlightbackground='black', fg='gold', font=('ariel 15 bold'), relief=GROOVE, command=previous)
btn1.pack(side=LEFT, padx=60, pady=5)

btn2 = Button(root, text="Next >", width=8, highlightbackground='black', fg='gold', font=('ariel 15 bold'), relief=GROOVE, command=next)
btn2.pack(side=LEFT, padx=60, pady=5)

btn3 = Button(root, text="Exit", width=8, highlightbackground='black', fg='gold', font=('ariel 15 bold'), relief=GROOVE, command=root.destroy)
btn3.pack(side=LEFT, padx=60, pady=5)

root.mainloop()

Here we define our previous function. I try never to use global functions in my code but this simplifies the code for tutorial purposes so I'm allowing it here. We decrement the i variable by 1 and if we get to a negative number we set it to 7. Since 7 is the last image/mp3 in our book, we are effectively going to the last page. We set the image_label variable to use the current image (based on "i" as the index) and display it. Finally, we create a thread to play the current mp3 based on the "i" index.

def previous():
    global i
    i = i - 1

    if i < 0:
        i = 7

    image_label.config(image=images[i])

    th = threading.Thread(target=playMp3, args=(i,))
    th.start()

Our next function is similar to previous except that we are incrementing the i variable and setting i to 0 if it goes beyond the value of 7.

def next():
    global i
    i = i + 1

    if i > 7:
        i = 0

    image_label.config(image=images[i])

    th = threading.Thread(target=playMp3, args=(i,))
    th.start()

Time to play the actual mp3 file. Depending on your operating system, there are built in programs to play sound files. For Mac you can use afplay and for Linux you can use mpg321. I don't have access to Windows so you'll have to research that one yourself. (If you run Windows and figure it out, please post in the comments below.) We use os.system() to call out to a program on our computer and send the file path of our mp3 file. Pretty cool, huh?

def playMp3(idx):
    # only works for mac
    os.system(f'afplay ./mp3/jabberwocky_{idx}.mp3')

    # for linux
    #os.system(f'mpg321 ./mp3/jabberwocky_{idx}.mp3')

So What Did We Learn?

  1. Creating text-based images with the PIL
  2. Generating text to speech mp3 files with gtts
  3. Basic GUI app creation with TkInter

Give Me Some Feedback AKA I Need Constant Validation

Please let me know what you think about this tutorial. I love comments!

Support This Site By Buying Me A Coffee!

If you find this tutorial helpful, please consider buying me a coffee. Thanks!