- Back to Home »
- Effective VUI Design + SRGS
Posted by : Adnan Farooq Hashmi Monday, April 11, 2005
Welcome to FoodTalks.NET Food Ordering service. Please say your Customer Account Number?
- Caller: The account number is FT1234.
- System: Please say or key-in your 5 digit code number.
- Caller: 78699
- System: Thank you; Would you like to place a new order or would you prefer a repeat of your previous order.
- Caller: I think I will go for a new order.
- System: Very well. Please say the meal ID on the FoodTalks.NET menu card that you would like to order.
- Caller: Ummmm...Meal ID 786.
- System: I understood Meal ID 786. Is that correct?
- Caller: Yes, that is correct.
- System: Would you like any thing else?
- Caller: No thanks, that will do for now.
- System: Your meal cost has been deducted from your account. Your meal will be delivered to your address in 15 minutes. Thank you for calling FoodTalks.NET; We look forward to serving you again. Good bye.
The above is a typical interaction between a caller and a Speech-enabled Voice-only (telephony) FoodTalks.NET (fictitious restaurant) application. For clarity, the utterances have been numbered. Little does one realize that they are talking to a machine and not a real-life human operator. Although the above example also goes to show a good VUI (Voice User Interface) design practice, what I wanted to demonstrate here is the use of the Speech Recognition Grammar Specification (SRGS). Lets look at the above call closely by understanding the utterances by both, the System and Caller, and looking at the underlined words in the utterances by the caller.
When the call is connected, the music allows the caller to know that he is hearing a recorded message so there really is no need for the message to explicitly say that the caller is talking to an automated system. Also, instead of saying "What is your Customer Account Number?" which would confuse the caller whether she has to dial the Account Number or say it, the system clearly asked the caller to SAY the Customer Account Number (Utterance 1).
The underlined words in the Caller utterances above represent 'tokens'. A Token is that part of an utterance which is of use to the system; the rest of the uttered words are ignored. For example, in Utterance 2, the caller could have said 'My account number is FT1234', 'My customer account number is FT1234', 'The Customer account number is FT1234', 'Customer account number is FT1234' or simple 'FT1234' instead of 'The account number is FT1234'. In each case, only the Customer Account Number i.e. FT1234 is to be used by the application to validate the caller. However, in order for the system to process any of the utterances by a caller, a Grammar has to be pre-defined to allow the system to extract the token from an utterance and generate a value from the processed voice input. The Grammar is defined using the Speech Recognition Grammar Specification or SRGS (which I would blog about in detail later). In short, SRGS is an XML specification that allows Speech Application Developers to create Grammars rules in a Grammar (*.grxml) file. One grammar file can contain multiple validation rules. The Grammar file is then processed in the browser by SALT's prompt tag to process a spoken utterance by a user. For example, the grammer to process Utterance 2 would be something like this:
<rule id="AccountNumber" scope="public">
<item>The Customer Account Number is</item>
<item>The Account Number is</item>
<item>My Customer Account Number is</item>
<item>My Account Number is</item>
<item>Customer Account Number is</item>
<item>Account Number is</item>
<ruleref uri="Library.grxml#Digit4" type="application/srgs+xml"/>
The repeat="0-1" attribute represents that the item is optional and may or may not occur. The ruleref tag specifies that another rule for a 4-digit number in a seperate grammar file (Library.grxml) is being accessed.
OK, thats it for me. My meal (Meal ID 786) would be here soon. :)