Creating a Text-to-Speech API part 2
After we have set up all parameters, we create a variable that will hold the name of the audio file which will be a randomly generated name. Afterwards, we launch eSpeak in the command line with all possible user parameters. If the user has not specified a parameter, it will just be blank.
1 2 | $fileInfo = “sounds/” . uniqid(true) . “.wav”; $result = exec(“espeak $file -w $fileInfo $speed $lang $text”); |
Then, we check if eSpeak has returned some text to the command line (it will return text if the command failed) or if the expected file name has not been created and if one of these has happened, we will stream a generic error file to the user. We also create a $noDel variable which would indicate that this file should not be deleted after streaming has been done.
1 | if($result || !file_exists($fileInfo)) { $fileInfo = “sounds/error.wav”; $noDel = true; } |
Afterwards, we add the necessary headers that would prevent the browsers from caching our script.
1 2 3 4 | //do not cache the page header(“Cache-Control: no-cache, no-store, must-revalidate”); // HTTP 1.1. header(“Pragma: no-cache”); // HTTP 1.0. header(“Expires: 0”); |
Now, we just stream our file by adding the required headers and reading the file
1 2 3 4 5 6 7 | //stream wav file header(“Content-type: application/octet-stream”); header(‘Content-disposition: attachment; filename=”‘.$fileInfo.‘”‘); header(“Content-transfer-encoding: binary”); header(“Content-length: “.filesize($fileInfo).” “); fpassthru($stream); fclose($stream); |
After the streaming has been done, we check if the $noDel variable has been set and if it has not been set we just remove the saved audio file from our server:
1 2 3 | if (!isset($noDel)) { unlink($fileInfo); } |
Finally, if the user has not provided a text or file parameter we suppose that he wants to read the API’s documentation and so we just display a static webpage with information about the API and a documentation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | } else { ?> <!doctype html> <html lang=“en”> <head> <meta charset=“UTF-8”> <title>Text–to–Speech API</title> <style> .big { font-size: 1.5em; } dl dd { padding-left: 15px; } dl dt { color: #555; font-size: 1.2em; } </style> </head> <body> <h1>Text–to–Speech API</h1> <p>This TTS Api relies on eSpeak to stream audio on demand. The audio files are not saved on the server for prolonged use so that would be nice in terms of storage costs.</p> <p class=“big”>Documentation: <dl> <dt>text (required)</dt> <dd><b>the text GET parameter is required unless you choose the file GET parameter</b>. In it, you provide the text that will be converted to speech and streamed </dd> <dt>file (required)</dt> <dd>The file GET parameter needs to be filled with the path to the text file which you want to convert to speech. <b>This parameter is only required if you do not add the text GET parameter</b></dd> <dt>speed (optional)</dt> <dd>The speed GET parameter dictates the speed of the pronounced speech. It receives a <b>positive number from 1 to 500</b></dd> <dt>lang (optional)</dt> <dd>This GET parameter defines the language of the text that you want to convert. It takes as a value <b>two–character code</b> of the language as defined below: <em><!— Supported languages follow —></em> |
How would a client use our Text-to-Speech API?
There are numerous ways such an API could be used but to exemplify: he could play different audios in his website depending on what his own users are doing using the HTML5 Audio API.
To exemplify, I have created different requests for our API. The user just loads an audio passing the parameters that he wants to our API and plays it to his users. The audio variable just speaks aloud some random text, the audio2 variable speaks aloud our APIs error audio file since it requests an unsupported language, the audio3 variable speaks aloud a random text file, the audio4 variable speaks aloud some text pronounced quite slowly and the audio5 variable speaks aloud some text in Bulgarian.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | <script> var audio; window.onload = function() { audio = new Audio(“index.php?text=” + encodeURIComponent(“Hello folks! It’s already December.”) + “&rand=” + Math.random() * 99999); audio.autoplay = true; audio.addEventListener(“ended”, function() { //error var audio2 = new Audio(“index.php?text=Matrix&lang=dqdqfqw&rand=” + Math.random() * 99999); audio2.autoplay = true; //read from file audio2.addEventListener(“ended”, function() { var audio3 = new Audio(“index.php?file=sayMe.txt&rand= “ + Math.random() * 99999); audio3.autoplay = true; audio3.addEventListener(“ended”, function() { var audio4 = new Audio(“index.php?text=” + encodeURIComponent(“Chuck Norris can take a screenshot of his blue screen.”) + “&speed=80” ); audio4.autoplay = true; audio4.addEventListener(“ended”, function() { var audio5 = new Audio(“index.php?text=” + encodeURIComponent(“Добре дошли, добри гости, коледари”) + “&lang=bg&rand=” + Math.random() * 99999 ); audio5.autoplay = true; }) }) }) }) } </script> |
Tutorial Categories:
Tutorial Categories: