Skip navigation.

The Speech Interface

I was part of the team working on the design of the speech portion of the system, meaning organizing the available functionality into an easy to use and efficient system.

Some design challeges with speech recognition systems include:

  • Users not knowing what to say
  • Users not knowing when to speak
  • What the user did say might be misrecognized
  • What is said might be rejected altogether
  • Users can easily become “lost” because of a lack of orientational and navigational cues

We developed the VUIGUI (voice user interface / graphical user interface) as a graphical way of representing the speech interaction. It served two critical functions:

  • It shows what the system “thinks” it heard.
  • It shows what the user is allowed to say.
VUIGUI

The VUI/GUI

Showing the user what the system heard allows the user to correct the system. It also subtly trains the user to enunciate the parts of speech that Juggler finds difficult to understand. This is similar to real human-to-human conversation. For example, when we speak to someone, we know that the listener understood because they nod, say “yep”, grunt, etc. Displaying what the user is allowed to say also provided a bootstrapping technique to train the user as to the available functions in the system.

Next: The graphical interace »