Creating a Text-to-Speech API part 1

December 23, 2015 7:08 pm

Creating a Text-to-Speech API part 1

{ Leave a comment }

To create a Text-to-Speech or Speech Synthesis API we will be using eSpeak. eSpeak is a speech synthesizer which supports many languages and is available for Linux, Android, OSX, Windows and Solaris. It can also be used as a command line program which can be very useful to developers. Moreover, some Linux distributions come with eSpeak pre installed and available for use in Terminal (such as Ubuntu).

We are going to take advantage of eSpeak and create an API which would be a site with different possible GET parameters which will stream an audio file in response to the HTTP request on demand. The audio files would be kept on the server for a very brief period of time so implementing the API in the real-world would not eat much storage resources.

Installing eSpeak

If you want to test whether you have eSpeak ready for use you can open your command line/Terminal and type espeak hello. If you get an error, then you would have to install eSpeak. If you are using Windows, you most likely do not have eSpeak on your machine. Navigate to the eSpeak Download page and download a copy for your machine. Once installed, it is preferred that you add eSpeak to your PATH environment variable so that you can only use espeak <COMMANDS> when calling eSpeak instead of typing the full path to the command line utility such as C:\Program Files (x86)\eSpeak\command_line\espeak.exe <COMMANDS>.
To set eSpeak in PATH on Windows 8.1 you would have to search for edit environment in your Search bar and add the path to the folder where espeak.exe is located (For Windows, the folder is called command_line)

Searching and opening the environment variables

Adding eSpeak to the PATH environment variable

Converting the API parameters to what we need for eSpeak

We create a single php file to hold our API and check if the user has provided a text argument. Adding a text argument would be required since we would not have any text to convert to speech otherwise Afterwards, we check if the user has provided a speed argument and if he did we check if the speed’s value is a positive number lesser than 501. If so, we convert the speed argument to a syntax that command line eSpeak would understand – something like –s 150 and we escape the value to prevent OS Command Injection.

Afterwards, if the user has selected a language for the text/speech, we convert it to the syntax eSpeak understands. It is wise to note that you can first check if the user has selected a language that eSpeak supports before adding it. We also ensure that the language input contains only alphabetic characters and dashes (only those are necessary for selecting a language) and escape it.

Finally, if the user has chosen a text for the speech synthesis we set the $text variable to the escaped and decoded value of the desired text.

<br />
if (isset($_GET[‘text’])) {<br />
$speed = (isset($_GET[‘speed’]) && is_numeric($_GET[‘speed’]) && $_GET[‘speed’] > 0 && $_GET[‘speed’] <= 500) ? “-s ” . escapeshellarg($_GET[‘speed’]) : “”;<br />
$lang = (isset($_GET[‘lang’])) ?  urldecode($_GET[‘lang’]): “”;<br />
if ($lang && !preg_match(“/^[A-Za-z\-]*$/”, $lang)) { echo “Incorrect language parameter!”;exit;}<br />
$lang = ($lang) ? “-v”. escapeshellarg($lang) : “”;<br />
$text =  (isset($_GET[‘text’]) && strlen($_GET[‘text’])) ? escapeshellarg(urldecode($_GET[‘text’])): “”;

if (isset($_GET[‘text’])) {

$speed = (isset($_GET[‘speed’]) && is_numeric($_GET[‘speed’]) && $_GET[‘speed’] > 0 && $_GET[‘speed’] <= 500) ? “-s “ . escapeshellarg($_GET[‘speed’]) : “”;