Class: 16 Date: 960226 Topic: Practical exercise Speech Processing 1
The goal of this exercise is exploring the Ultimedia Services (UMS) speech processing tools which are available on the IBM AIX workstations. The software and the documents for these tools can be found in the directory:
/usr/lpp/UMSFurthermore you will take a look at some speech synthesis demo's which are available on the web.
Up until now I only have been successfull in getting the Ultimedia Services software working on 4 machines in H327:
gripe: both speech recognition and speech synthesis are available
lenngren: both speech recognition and speech synthesis are available
nordqvist: only speech synthesis is available
ekman: only speech synthesis is available. The speaker is bad.
For some unknown reason the software does not work on gardell, karlfeldt, lindgren and lugn. The other two machines (lang and trenter) have no audio device /dev/paud0.
In this part of the exercise you will test the speech recognition capabilities of the Ultimedia Services software. To be able to do this part you need a machine that is capable of speech recognition and a microphone. If you are unable to get one of these please continue with another part of this exercise and do the speech recognition part later.
At the back of the computer box under the table you will find four round sockets. Put the microphone plug in the socket that is closest to the ventilator (the top one). Now start the program:
run_ums DtNavigate
Two windows should appear on the screen: the Speechbar window and the Navigator window. The Speechbar window controls the microphone. The Navigator window shows you the speech commands that you can use to control your window system. It also has a facility for adding new commands.
Now turn the microphone on by chosing Recognizer from the menubar in the Speechbar window. The background color of the microphone image will change from grey to white. Now you can test the speech recognition capabitilties of the system by reading some commands from the Global list in the Navigator window. The program will show each command that it has recognized in the Speechbar window. There are 21 commands listed in the Global list. How many does the program recognize from your voice?
If you want to know more about speech recognition using UMS you can read the help texts that are available from the Navigator window or read the file:
/usr/lpp/UMS/speech/speech_reco/README.NAVIGATION
In this part of the exercise you will test the speech synthesis capabilities of the Ultimedia Services software. To be able to do this part you need a machine that is capable of speech synthesis. If you are unable to get a machine like that please continue with another part of this exercise and do the speech synthesis part later.
Start with entering the command:
speak
Three windows will be opened:
Now enter a few English words in the tnt window and press the Speak button to make them being spoken aloud. Test different settings for volume, pitch and tempo to see which ones you like best. Do you like the performance of this speech synthesis program? How does it perform for Swedish sentences?
If you want to know more about speech synthesis using UMS you can read the help text that is available from the tnt window or read the file:
/usr/lpp/UMS/Demos/tts/README
To be able to do this part you need a machine that is capable of speech output. If you are unable to get a machine like that you will have to wait until one becomes free or work together with someone that has access to one of these machines.
Some web sites offer synthetic speech. The only one I know for Swedish is:
http://www.speech.kth.se/info/software.html
but the speech signals that have been stored there are slow and have a very low pitch. A nice site is:
http://www.research.att.com/cgi-bin/voices.form/
You can enter an arbitrary word after which the system will generate a spoken version for it. The Swedish vowels will be ignored by the software. If you want to make the program pronounce a Swedish word you will have to enter it in phonemes.
Sites with the same capabilities are:
http://wwwtios.cs.utwente.nl/say/
http://www.fb9-ti.uni-duisburg.de/demos/speech.html
http://www.centigram.com/centigram/TruVoice/index.html
If you are interested in more, check out the Frequently Asked Questions list of the newsgroup comp.speech. Speech synthesis sites can be found in Q5.4.