Examples
A sample application, VoicePad, has been
added to the examples.userinterface package to demonstrate a fairly simple
Swing-based application.
Several new examples have been added
to the examples.synthesis and examples.recognition packages
to demonstrate the IO features of the com.cloudgarden.audio package.
Also, new example packages, examples.applet and examples.remote,
have been added. Please consult these examples for usage of the classes.
Also, and since it was found that the javax.sound.sampled package
can sometimes hang a system (mainly when trying to open a TargetDataLine
and SourceDataLine at the same time), it might save time to use the code
from some of the examples as a starting point for new code.
Running
the Examples
The examples can be run:
-
using the batch files supplied with the
implementation, of the form TestXXX.bat - eg. TestHelloWorld.bat,
or
-
from a console window by typing (assuming
Java 2 is being used) "java -cp .;cgjsapi.jar examples.synthesis.HelloWorld"
etc. from the same directory as the "examples" folder is in.
-
by running the TestExamples.bat script
- this starts up a Java app which lets you run 12 of the test demos with
output going to a scrolling Java window - useful for those poor soles working
on Windows 98 or Me boxes who want to see more than 50 lines of text.
-
from the Java installer supplied with
the implementation - see the "Using the Java Installer" part of the Installation
section. After it extracts the jar file it shows twelve buttons which run
twelve examples and show the output inside a scrolling Java frame.
-
The Applet examples can be run in a browser,
provided the JSAPI files are installed correctly (see the applets
page).
Most of the examples use the com.cloudgarden.speech.userinterface.SpeechEngineChooser
dialog to allow you to select which of the installed engines to use for
the demo - note that this is a Swing component, so it is recommended that
they be run using Sun's JDK 1.3. The examples that synthesize speech will
require that you select a "Voice" node under a "Synthesizer" node and hit
the "Set Voice" button on the dialog before proceeding, and the recognition
examples will require that you pick a "Profile" node under a "Recognizer"
node in the same way. The "WhatTimeIsIt" example requires you to pick both.
Description
of the Examples
Some of the
examples are described here - all of the examples are described briefly
in the javadoc.
-
examples.userinterface
-
examples.userinterface.VoicePad
- A sample Swing-based application which uses the Mouth component and a
JEditorPane to display a text document and highlight words currently being
spoken. When the "Speak" button is pressed it will start speaking either
the text currently selected, or will start speaking from the current cursor
position if no text is selected. Also allows synthesized speech to be saved
to a MPEG file.
-
examples.userinterface.TestInterfaces
- Displays a GUI with buttons which pop-up the Recognition and Synthesis
engines' native user interfaces.
-
examples.applet
-
BrowserApplet
- open the browser.html file in IE or Netscape for a simple voice-enabled
browsing experience.
-
SpeechApplet
- a simple TTS applet.
-
examples.audio
-
Contains examples
demonstrating the com.cloudgarden.audio package.
-
ConversionTest
- demonstrates the AudioFormatConverter's capabilities, converting from
8 to 16kHz with and without linear interpolation.
-
FileToFileToLine
- Demonstrates copying an mp3 file to a gsm file, then playing the gsm
file to an AudioLineSink.
-
FileToFiles
- Demonstrates reading from one file and writing to two different ones
with different formats. Uses the AudioMediaURLSource, AudioSplitter, AudioFormatConverter
classes.
-
WAVtoMPEG
- Demonstrates reading from a wave file and writing to an mp3 file.
-
examples.recognition
-
examples.recognition.DictationTest
- Tests recognition of phrases from the full vocabulary - after it has
recognized three phrases it will stop.
-
examples.recognition.WhatTimeIsIt
- Tests recognition of 'what time is it' and 'what date is it' and responds
with a spoken answer
-
examples.recognition.LoadJSGFFromURL
- Tests recognition of phrases loaded from the JSGF file ../examples/grammars/helloWorld.gram
- example phrases are 'pause and finish and stop', 'I'd like five bananas','I'd
like twenty five bananas and twelve oranges', 'hello my name is <your
name here>', 'open the bottle please' - after it has recognized three phrases
it will stop.
-
examples.recognition.DictationFromFile
- Audio data read in from WAV file and sent to a Recognizer
-
examples.recognition.DictationFromMPEGFile
- Audio data read in from MP3 file and sent to a Recognizer
-
examples.recognition.ParserTest
-
Demonstrates use of the RuleGrammar.parse method
-
examples.recognition.GramCommitTest
- Tests activating rules and grammars in response to spoken commands. One
grammar contains the words "alpha","bravo","charlie","delta", but only
one rule is active at a time and the other grammar contains "one","two","three","four"
- again, only one rule active at a time. Also, only one of the grammars
is active at a time. Switch between grammars with the command "switch"
and activate the next rule in each grammar with the command "next".
-
examples.remote
-
FileToLineRemote
- Demonstrates network transmission of audio data from an AudioFileSource
to a AudioLineSink in compressed (GSM) and uncompressed (RAW) formats,
from server to client and from client to server - for ease of demonstration
incorporates both server and client since both run on the localhost.
-
contains several examples demonstrating
transfer of audio speech data between a server machine running a Recognizer/Synthesizer
and a client machine without speech engines. Start
up the Server on the server machine and start the Client on the client
machine with a command-line argument of the server machine's network name
(or no argument if Client and Server are on the same machine).
-
RemoteAudioServer
and RemoteAudioClient
-
doesn't use speech engines - just transfers audio data from a client machine
to a server machine where it is played to a SourceDataLine.
-
RemoteDictationServer
and RemoteDictationClient-
sends audio data from the client to the server machine running a Recognizer
with a DictationGrammar.
-
RemoteSynthesisServer
and RemoteSynthesisClient
- sends synthesized speech to a client machine.
-
WhatTimeIsItServer
and WhatTimeIsItClient
audio data "what time is it" sent from client to server and synthesized
speech sent from server to client.
-
examples.rtp
-
CaptureAndPlay
- This class opens two RTP steams - one receives audio data from the url
"rtp://<localhost>:12346/audio" and plays it to the local output device,
and the other sends audio data (captured from the local audio capture device)
to the url "rtp://<localhost>:12344/audio". This class should generally
be started before running any of the other classes in this package, since
the other classes either send data to the url "rtp://<localhost>:12346/audio"
or receive data from the url "rtp://<localhost>:12346/audio", or do
both.
-
DictationFromRTP
- allocates a recognizer, listening on an RTP port, and sends the contents
of a WAV file to the RTP port the recognizer is listening on.
-
FileToRTP
- Plays file to RTP stream (doesn't use JSAPI - it's just an RTP demo)
- to hear output stream, start up CaptureAndPlay class, then start this
class
-
RTPCommandVoiceServer
- Creates a Recognizer which listens to an incoming RTP stream and replies
with synthesized speech on a separate RTP stream - to send the RTP stream
to the recognizer and hear it's response, start up the CaptureAndPlay class,
then this class, then say "what date/time is it", or "goodbye computer".
-
RTPDictationVoiceServer
- Creates a Recognizer which listens to an incoming RTP stream and replies
with synthesized speech on a separate RTP stream - to send the RTP stream
to the recognizer and hear it's response, start up the CaptureAndPlay class,
then speak into the microphone - the recognizer should respond with what
it thinks you said.
-
SpeakToRTP
- Uses a synthesizer to send audio data to an rtp url. To hear the output,
run the CaptureAndPlay class then run this example.
-
examples.synthesis
-
examples.ListEngineTest
-
Lists all the available Synthesizers
(for all languages) and their Voices, says "hello" with each of the Voices,
then lists all the Recognizers and SpeakerProfiles.
-
examples.vocab.VocabTest
-
Demonstrates changing the pronunciation
of "time" to "aw r" (hour) for both speech synthesis and recognition.
-
examples.vocab.Pronunciations
-
Demonstrates the addition of a word (with
pronunciation) to the VocabManager, and by using JSML tags in the speech
string, and oddities of the SAPI4 engines.
Before speech dictation will work well
though, make sure you have a good microphone (built-in ones on laptops
generally have way too much background noise) and spend ten minutes or
so training the speech engine to your voice - after you install Microsoft's
Speech SDK 5 there should be a "Speech" entry in the Windows control panel
which lets you do this, or you can use the "User-training" button in the
examples.userinterface.TestInterfaces
example above.