Avaya Logo

Previous Topic

Next Topic

Book Contents

Book Index

Recognizing caller speech

Within the context of interactive voice response (IVR) systems and applications, the term speech recognition, sometimes also called automatic speech recognition or advanced speech recognition (ASR), is the ability of an IVR system to recognize spoken responses from a caller and either convert the responses to text or use the results to initiate some system action. Speech recognition on the Avaya IR system requires the Speech Proxy package (AVsproxy).

On Avaya IR systems today, WholeWord speech recognition uses standard WholeWord grammars to recognize numerical and yes or no responses

WholeWord speech recognition is good at what it does, and in many applications it is quite sufficient. It does, however, have some limitations, both in the number of words or phrases that can be recognized, and in the inability to take into account grammatical sentence structure. While this speech recognition technology can recognize specific words or phrases, even when extraneous words or phrases are added by the caller, it cannot recognize what part the recognized speech plays in the overall statement. In other words, WholeWord speech recognition is designed to recognize specific words or phrases, but not to interpret what it recognizes.

Natural language speech recognition (NLSR) takes the speech recognition process several steps further by providing a more natural conversational interface with IVR systems. Not only can NLSR be used to recognize particular words and phrases, it can also interpret and assign meaning to the speech it recognizes.

For example, under the more basic form of speech recognition, a caller can respond only to specific prompts, such as "Say `one' if you want information about..." or "Say `yes' if this is correct." NLSR enables you to write applications that ask the caller more open-ended questions, such as a banking application that presents the caller with a list of options and asks "What would you like to do?" When the caller responds "I'd like to know the balance of my checking account, please," the system can recognize the kind of information the caller wants (the balance in a checking account ) and can automatically direct the call to a new prompt that asks for the caller's checking account number. This technology provides a more natural way of interacting with callers because it responds more like a human agent.

NLSR is also able to take into account grammatical structures. This allows it, for instance, to recognize and deal appropriately with differences in statements like the following caller responses:

"I would like to fly from Chicago to LAX."

"I need to get from LAX to Chicago."

NLSR is also capable of understanding natural numbers ("seventy-six" instead of "seven six"), natural dates ("July 26th" instead of "zero seven two six") and natural currency ("25 dollars" instead of "two five zero zero").

Because of the relatively complex nature of NLSR, it requires the use of larger vocabularies and grammars. For this reason, it requires a stand-alone recognition server to do the speech recognition. The Avaya IR system communicates with the recognition server using a proxy interface to support the NLSR feature.

© 2006 Avaya Inc. All Rights Reserved.