- Speaking and Hearing: Speech Synthesis and Speech Recognition
- Getting Up and Running with Java Speech Technology
- Politics of Speech-Enabled Software
- Conclusion
Getting Up and Running with Java Speech Technology
The setup for speech software is a little more fiddly than usual for Java programs. To run the Java examples, you’ll need the following:
- A speech engine (I used FreeTTS version 1.2.1)
- Java TM 2 SDK, Standard Edition, v 1.4 or higher (I used version 1.5)
- A pair of headphones or PC speakers
I had to do a few things to get the code to run:
- Download a copy of FreeTTS.
- Unzip the freetts-1.2.1 distribution into the required disk location.
- Copy the file speech.properties from the top-level freetts-1.2.1 folder into
the lib folder of your Java Runtime Environment (JRE). On my PC, the folder is
as follows:
C:\Program Files\Java\jre1.5.0_06\lib
You should now be ready to run the FreeTTS Java examples. Let’s take a look at some simple speech synthesis. I always think that it’s a good practice to create a complete standalone example from scratch, rather than just modifying an example. To do this, I used the facilities included with FreeTTS, as described in the next section.
Payment Program with Speech Synthesis
To begin with, I’ll put my money where my mouth is (after criticizing that parking system earlier) and create a simple payment program. This is a modified version of one of the FreeTTS examples. Follow these steps to create, build, and run this program:
- Open the file demo.xml in the top-level FreeTTS folder.
- Modify demo.xml by pasting the contents of Listing 1 (below) between the
following targets:
compile_jsapi_mixedvoices compile_jsapi_player
- Under the folder FreeTTS\Code\freetts-1.2.1\demo\JSAPI, create a new folder called PaymentMachine.
- Copy the download files Manifest and PaymentMachine.java into the PaymentMachine folder.
- To build the code, run the following command from any FreeTTS folder:
ant -find demo.xml
- Run the following command from the freetts-1.2.1 folder:
java -jar bin/PaymentMachine.jar
Listing 2 shows the modifications needed in demo.xml.
Listing 2 A new target for the payment program.
<target name="compile_jsapi_paymentmachine" depends="compile_demo_util" if="jsapi_jar.isPresent"> <mkdir dir="${demo_classes_dir}/JSAPI/PaymentMachine"/> <javac debug="true" source="1.4" deprecation="true" srcdir="${src_dir}/demo/JSAPI/PaymentMachine" destdir="${demo_classes_dir}/JSAPI/PaymentMachine"> <classpath> <path refid="libs"/> </classpath> </javac> <mkdir dir="${bin_dir}"/> <jar destfile="${bin_dir}/PaymentMachine.jar" manifest="${src_dir}/demo/JSAPI/PaymentMachine/Manifest" basedir="${demo_classes_dir}/JSAPI/PaymentMachine" includes="*.class" filesonly="true" compress="true"/> </target>
At this point, you should see program output similar to that in Listing 3.
Listing 3 Payment program output prompts.
C:\ FreeTTS\Code\freetts-1.2.1>java -jar bin/PaymentMachine.jar ** Payment Machine - JSAPI Demonstration program ** Allocating synthesizers...Loading voices...Ready for payment. Initial Greeting - requesting ticket Validating ticket Handle receipt request Cleaning up and exiting.
If you run the program and watch the output from Listing 3, you’ll see that the output prompts follow the audio-driven flow of the program. This is pretty important in speech-enabled software—you don’t want to see a message concerning receipt handling when the customer ticket has only just been presented! This important requirement is implemented using the code in Listing 4.
Listing 4 Synchronizing speech and program prompts.
synthesizer1.speakPlainText("Welcome to speech-enabled payment.", null); System.out.println("Initial Greeting - requesting ticket"); synthesizer1.speakPlainText("Please insert your ticket", null); synthesizer1.speakPlainText("Thank you.", null); synthesizer1.waitEngineState(Synthesizer.QUEUE_EMPTY); synthesizer1.speakPlainText("I’m checking your ticket now. Please wait.", null); synthesizer1.waitEngineState(Synthesizer.QUEUE_EMPTY); System.out.println("Validating ticket");
Did you notice the calls to waitEngineState() in Listing 4? These calls are needed in order to synchronize the synthesis with the program output. If I omit these calls, then the output prompts appear out of sequence while the text is being spoken. Clearly, if this is my requirement for all program output, then it would be better to have a separate method call that incorporates the calls to both speakPlainText() and waitEngineState(). This method call would avoid the need for peppering the code with calls to waitEngineState(). But you get the idea.
As you can see, it’s not too difficult to produce some speech-enabled software.
Where Speech Is Different
One of the things I like about speech code is that it adds an interesting new dimension to software. In the code in Listing 5 (an excerpt from my payment program), I include information about how much time the customer has available to leave the car park.
Listing 5 Edgy information that can help customers.
synthesizer1.speakPlainText("To avoid further charges, you now have 15 minutes to leave the car park.", null);
In printed form, this message might be interpreted as rude, or it might be printed on the back of a ticket in microscopic writing and hence never get read by the customer. But in a speech-enabled program, such a message serves to inform the customer in a useful way—I know that it’s always a mystery to me how quickly I have to hurry along after paying for parking. (Five minutes? Ten?)
In other words, certain items of information are really quite well-suited to the medium of synthesized speech. If the message was printed on the screen of the payment machine, the customer might not see it.
Clearly, this type of issue is a matter of opinion/design—and possibly even culture—but it indicates that speech-enabled software is different from voiceless apps. Such software opens up new and intriguing information channels.
Tinny Voices
If you listened to the speech synthesized in my example program, you might have noticed a certain tinny quality to the voice. This can be changed easily. If you run the FreeTTS MixedVoices example (with the command java -jar bin/MixedVoices.jar), you can listen to a somewhat more mellifluous voice.
The next example uses another of the FreeTTS samples. This one provides a simple example of using the Java Speech API to speak the time, using the high-quality FreeTTS cluster unit selection voice. If you managed to get the first two examples working, then following command is all that’s required for this one. (As in the previous examples, run this command from the top-level freetts-1.2.1 folder.)
java -jar bin/JTime.jar
When you run this program, you should see the console output shown in Listing 6.
Listing 6 JSAPI Time Demo.
All time Mode JSAPI Synthesizers and Voices: FreeTTS en_US time synthesizer (mode=time, locale=en_US): alan Using voice: alan Enter time (HH:MM): 7:13 Bad time format. The format should be HH:MM Enter time (HH:MM): 07:13 Enter time (HH:MM): 19:20 Enter time (HH:MM):
When you enter a time at one of the prompts (such as 07:13, as shown in Listing 6), the program responds by speaking that time in a distinctly non-tinny voice. To exit the program, just press Enter without typing a time.