FE 'tkGooies' Utilities

'AUDIOtools' group

tkSpeakText_ with_espeak

(an 'espeak' Front End)

(FE = Freedom Environment)

The GUI interface of the
'tkSpeakText' tkGooie
---
which is a 'front end' for
the 'espeak' program.
---
This 'tkGooie' utility
speaks the text that the
user enters in the green
scrollable text area.

FE Home Page > FE Downloads Page >

FE 'tkGooies' Description Page >

FE 'tkGooies' 'AUDIOtools' Page >

This
'tkSpeakText' (an 'espeak' frontend)
tkGooie Page

INTRODUCTION to the
'tkSpeakText_with_espeak' Tcl-Tk script

For about a year now (circa 2015), I have been planning to make a Tk GUI 'front end' for a text-speaking utility --- using a text-speaking program that is available on Linux.

On a vacation in January 2016, I noticed that I had the 'espeak' program on the little Acer netbook that I take on vacations --- and, after looking at the 'man page' for 'espeak', I started programming the Tk GUI for that command.


THE GOALS

I noted that 'espeak' had provisions for parameters such as 'amplitude', 'pitch', and 'speed'.

There were other possible options besides audio output to computer speakers, such as

  • write a '.wav' audio file

  • write a 'phoneme' file

My goals for the Tcl-Tk script for this GUI were largely determined by the 'espeak' options.

I decided to implement most of the 'espeak' options via the GUI.


THE GUI LAYOUT

I made a 'text-sketch' for the GUI for this 'speak-text' utility.



CONVENTIONS for the GUI 'text-sketch' below:

   * SQUARE-BRACKETS indicate a comment not to be included on the GUI.
   * BRACES indicate a Tk 'button' widget.
   * UNDERSCORES indicate a Tk 'entry' widget.
   * A COLON indicates that the text before the colon is on a 'label' widget.
   * CAPITAL-O indicates a Tk 'radiobutton' widget.
   * CAPITAL-X indicates a Tk 'checkbutton' widget.

                --------------------------------------------------------------------------
                tkSpeakText --- using 'espeak'
                [window title]
                --------------------------------------------------------------------------

  .fRbuttons    {Exit} {Help} {Speak} {WriteWAVfile} {WritePhonemeFile}

  .fRcontrols1  Amplitude: <--------O---------------->    Pitch: <---------O--------->

  .fRcontrols2  Speed (words/min): <---------O-------->    X Speak punctuation chars

  .fRmsg        [ ..........  Messages go here  .........................................]

  .fRtext       |------------------------------------------------------------------------A
                |                                                                        |
                |     [This scrollable text area contains the text-to-be-spoken.]        |
                |                                                                        |
                |                                                                        |
                |                                                                        |
                |                                                                        |
                |                                                                        |
                |<--------------------------------------------------------------------->V


GUI Components

From the GUI 'sketch' above, it is seen that the GUI consists of about

  • 5 button widgets
  • 4 label widgets
  • 3 scale widgets
  • 1 text widget with scrollbars
  • 1 checkbutton widget
  • 0 radiobutton widgets
  • 0 listbox widgets
  • 0 canvas widgets

All but the 'label' widgets provide operating parameters/options in this utility.

Hence there are about 5 + 3 + 1 + 1 = 10 options on this utility.


I should point out here that I was not especially interested in coming up with a 'beautiful utility'.

I just wanted a utility that would make speaking of text, with 'espeak', as simple as entering/pasting text in a text-widget and clicking on a 'Speak' button.

I am certainly interested in making pretty GUI's --- as my pages on

and

have indicated.

But at this time, I am satisfied to implement the 'functionality', and let the 'beauty' go for a later date (when I have more beauty tools/code at hand).


SCREENSHOT OF THE GUI

On the basis of the GUI-layout sketch above, I ended up with the GUI seen in the following image.

When the GUI first comes up, the user can simply click on the 'Speak' button to have the initial text string "This is a test" spoken through the computer speakers.

    NOTE:
    I found in testing that it takes about 4 seconds, after the text is spoken, for the 'espeak' command to 'recover' and be ready to respond to another click on the 'Speak' button.

    This situation is refected by a 'WAIT for ready msg' text string that is displayed in the message-line of the GUI --- after a click on the 'Speak' button.

The user can experiment with the 'pitch' and 'speed' options to get different voice qualities.

PLAYBACK of the '.wav' and phoneme files:

When either of these two files are written, they are immediately 'played'.

    The 'phoneme' text file is 'played' by showing it in a text-file viewer.

The 'players' are set via two variables in the Tk script:

PLAYERwav   and   PLAYERtxt.

The following 'set' statements show some examples of how the two 'players' can be changed.



 # set PLAYERwav "/usr/bin/audacity"
 # set PLAYERwav "/usr/bin/ffplay"
   set PLAYERwav "/usr/bin/totem"
 
 # set PLAYERtxt "/usr/bin/gedit"
 # set PLAYERtxt "/usr/bin/kedit"
   set PLAYERtxt "$env(HOME)/apps/bin/xpg"


This Tk script can be edited to change the audio-file and text 'players' that are used.

The 'set' statements are near the bottom of the script.

VOICE setting:

The 'espeak' command allows for different 'voices' reflecting different languages.

The following 'set' statement, near the bottom of the script, sets the voice.

set VARvoice "english-us"

Other voices can be set --- such as czech, german, greek, finnish, french, slovak, spanish, swedish, etc.

The command 'espeak --voices' can be used to show the available voices.


DESCRIPTION OF THE CODE

Below, I provide the Tk script code for this 'speak-text' utility.

I follow my usual 'canonical' structure for Tk code for this Tk script:



  0) Set general window and widget parms (win-name, win-position,
     win-color-scheme, fonts-for-widgets, widget-geometry-parms,
     text-array-for-labels-etc, win-size-control).

  1a) Define ALL frames (and sub-frames, if any).
  1b) Pack   ALL frames and sub-frames.

  2) Define and pack all widgets in the frames, frame by frame.
              Within each frame, define ALL the widgets.
              Then pack the widgets.

  3) Define keyboard and mouse/touchpad/touch-sensitive-screen action
     BINDINGS, if needed.

  4) Define PROCS, if needed.

  5) Additional GUI initialization (typically with one or more of
     the procs), if needed.


This Tk coding structure is discussed in more detail on the page A Canonical Structure for Tk Code --- and variations.

This structure makes it easy for me to find code sections --- while generating and testing a Tk script, and when looking for code snippets to include in other scripts (code re-use).

I call your attention to step-zero.

One thing that I started doing in 2013 is use of a text-array for text in labels, buttons, and other widgets in the GUI.

This can make it easier for people to internationalize my scripts.

I will be using a text-array like this in most of my scripts in the future.


Experimenting with the GUI

As in all my scripts that use the 'pack' geometry manager (which is all of my 100-plus scripts, so far), I provide the four main 'pack' parameters --- '-side', '-anchor', '-fill', '-expand' --- on all of the 'pack' commands for the frames and widgets.

That helps me when I am initially testing the behavior of a GUI (the various widgets within it) as I resize the main window.

I think that I have used a pretty nice choice of the 'pack' parameters.

The label and button and scale widgets stay fixed in size and relative-location if the window is re-sized --- while the text-area expands/contracts horizontally and/or vertically whenever the window is re-sized.

You can experiment with the '-side', '-anchor', '-fill', and '-expand' parameters on the 'pack' commands for the various frames and widgets --- to get the widget behavior that you want.


Additional experimentation:

You might want to change the fonts used for the various GUI widgets.

For example, you could change '-weight' from 'normal' to 'bold' --- or change '-slant' from 'roman' to 'italic' --- or change the font sizes.

Or change font families.

In fact, you may NEED to change the font families, because the families I used may not be available on your computer --- and the default font that the 'wish' interpreter chooses may not be very pleasing.

I use variables to set geometry parameters of widgets --- parameters such as border-widths and padding.

And I have included the '-relief' parameter on the definitions of frames and widgets.

Feel free to experiment with those 'appearance' parameters as well.

If you find the gray palette of the GUI is not to your liking, you can change the value of the RGB parameter supplied to the 'tk_setPalette' command near the top of the code.


Some features in the code

There are plenty of comments in the code, to describe what most of the code-sections are doing.

You can look at the top of the PROCS section of the code to see a list of the procs used in this script, along with brief descriptions of how the procs are called and what they do.

Below is a quick overview of the procs.



 - 'speak_text'              called by 'Speak' button

 - 'write_wav_file'          called by 'WriteWAVfile' button

 - 'write_phoneme_file'      called by 'WritePhonemeFile' button

 - 'clear_text'              called by 'ClearText' button

 - 'advise_user'             called by various procs

 - 'popup_msgVarWithScroll'  called by the 'Help' button


The Tcl 'after' command is used in the 'speak_text', 'write_wav_file', and 'write_phoneme_file' procs to allow some time for the 'Speak' button to become functional again after the 'espeak' command is invoked.


Comments in the Code

It is my hope that the copious comments in the code will help Tcl-Tk coding 'newbies' get started in making GUI's like this.

Without the comments, potential young Tcler's might be tempted to return to their iPhones and iPads and iPods --- to watch 'Weather Gone Wild' videos --- and other 'Gone Wild' videos.


CODE for the 'espeak' Front End GUI

Here is a link to the code for the Tk script:

'tkSpeakText_with_espeak.tk'

With your web browser, you can 'right-click' on this link --- and in the menu that pops up, select an item like 'Save Link Target As ...'   ---   to save this file to your local computer.

Then you can rename the file to remove the '.txt' suffix.

Make sure that you have execute permission set on the file --- in order to execute the script.


A SOURCE OF 'espeak':

I did my testing of this GUI front-end with version 1.41.01 (circa 2009aug25) of 'espeak'.

If someone is using a Debian-based Linux system such as Linux Mint or Ubuntu --- or Debian itself --- various versions of a binary package of the 'espeak' program can be found at the snapshot.debian.org web site --- in particular, on the espeak binary packages page.

For Intel or AMD 32-bit or 64-bit machines, you can download a '.deb' file such as 'espeak_1.41.01-1_i386.deb' or 'espeak_1.41.01-1_ia64.deb' --- or a much more recent version.

Once you have downloaded the '.deb' file into a download directory on your computer, to do the install, you can right-click and choose to run Gdebi --- if you are using a GUI file manager on Linux Mint or Ubuntu or Debian.


SOME POSSIBLE ENHANCEMENTS

If I were to use this utility often to make 'artificially-spoken-text' audio files, I would probably not want to use '.wav' files because they are rather disk-space-hungry files.

It is possible to use the 'lame' command to make a lossy-compressed '.mp3' file from a '.wav' file.

The '.mp3' file may be about 10-percent of the size of the original '.wav' file.

Messages from an example run :



$ lame userid_espeak.wav userid_espeak.mp3

 LAME 3.98.2 32bits (http://www.mp3dev.org/)
 CPU features: MMX (ASM used), 3DNow! (ASM used), SSE, SSE2
 Using polyphase lowpass filter, transition band:  8269 Hz -  8535 Hz
 Encoding userid_espeak.wav to userid_espeak.mp3
 Encoding as 22.05 kHz single-ch MPEG-2 Layer III (11x)  32 kbps qval=3
     Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA 
     47/47    (100%)|    0:00/    0:00|    0:00/    0:00|   24.555x|    0:00 
 -------------------------------------------------------------------------------
    kbps       mono %     long switch short %
    32.0      100.0        34.0  21.3  44.7
 ReplayGain: +5.0dB


This run made a 4.8 Kilobyte '.mp3' file from the 48.7 Kilobyte '.wav' file that resulted from submitting the text "This is a test" to the 'espeak' command.

If I were making lots of these 'spoken-text' audio files and I wanted small files, then I might build a 'lame' option into this GUI utility.

Alternatively, one could use 'ffmpeg' to create a '.mp3' file from a '.wav' file with a command like :

ffmpeg   -i audio.wav   -acodec libmp3lame   audio.mp3

as seen at linuxconfig.org.

If '.mp3' is not a desirable option, then one could try the Vorbis audio format in an Ogg container format with a command like

ffmpeg   -i audio.wav   -acodec libvorbis   audio.ogg

Other than this possible wav-conversion enhancement, I have no other ideas for enhancements at this time --- other than possible improvements and additions to the help-text that is presented via the 'Help' button of the GUI.


IN CONCLUSION

As I have said on several other code-donation pages on this site and on the Tclers' wiki at wiki.tcl.tk ...

There's a lot to like about a utility that is 'free freedom' --- that is, no-cost and open-source so that you can modify/enhance/fix it without having to wait for someone else to do it for you (which may be never).

A BIG THANK YOU to Ousterhout for starting Tcl-Tk, and a BIG THANK YOU to the Tcl-Tk developers and maintainers who have kept the simply MAH-velous 'wish' interpreter going.

Bottom of this page for
tkSpeakText - an 'espeak' FrontEnd
--- a utility in the FE 'tkGooies' system,
in the 'AUDIOtools' group
--- and in the 'TEXTtools' group.

To return to a previously visited web page location, click on the Back button of your web browser a sufficient number of times. OR, use the History-list option of your web browser.
OR ...

< Go to Top of Page, above. >

Page history:

This FE web page was created 2016 Feb 04.

Page was changed 2019 Feb 27.
(Added css and javascript to try to handle text-size for smartphones, esp. in portrait orientation.)

Page was changed 2019 Jun 25.
(Specified image widths in percents to size the images according to width of the browser window.)


NOTE:
This code has not been posted on a page at the Tcler's Wiki --- wiki.tcl-lang.org --- formerly wiki.tcl.tk. If I donate the code on a page there, I intend to put a link here.