Tuesday, May 02, 2006

Developing Speech Applications - Part II

ASP.NET Speech applications developed using the Microsoft Speech Application SDK (SASDK) resemble typical ASP.NET applications in terms of their programming and exection model. Hence, it is extremely easy for web developers to use their existing skill set to develop speech applications from scratch, or use their existing ASP.NET code to speech enable a previously developed project.

In Part I, I slightly touched upon the type of speech applications i.e. Voice-Only and Multi-modal, that you can develop using the SASDK. The diagram below would give you some idea about how ASP.NET Speech apps differ from the traditional ASP.NET web apps.

Application Types
The only difference between a typical web app, a voice-only speech app, and a multi-modal speech app is the top layer that the user interacts with. In the typical ASP.NET web application (left-most in the figure), the GUI simply comprises one or more web forms containing text-boxes, combo-boxes, images, plain-text etc. In a multi-modal speech applicaion (right-most), the user, although interacts with the application through the GUI (web forms displayed in a browser), can also use voice commands to make selections in combo-boxes, speak text into a text-box, etc. In both of these scenarios (typical web application and multi-modal speech application), the web form loads into the user's browser in response to a request sent by the client to the web server. A voice-only speech app (middle of diagram), is different from the previous two in that it does not have a GUI that a user can interact with; nor does it load into a web browser on the client user's PC. The voice-only application user interacts with the app through a telephone using her voice or touch-tone (DTMF), with the speech application running on a remote server. The output is also in the form of voice, which the user hears on her phone. It is important to note that in the voice-only application scenario, the speech app is infact running in a web browser (like all ASP.NET applications) on the server, but the loaded web pages are composed of SALT instead of HTML tags. Also, while the ASP.NET portion of the application is hosted in IIS, the task of receiving the call and transferring voice commands and responses between the telephone line and the app rests with Microsoft Speech Server (MSS).

To start off, lets look at a typical ASP.NET application first. The figure below provides an overview of how your everyday ASP.NET web app works.

Typical ASP.NET Application
The series of steps/events shown in the above diagram are:

  1. Request sent from a web form loaded into the client's web browser.

  2. Request processed.

  3. Query sent to database.

  4. Query results returned to the application.

  5. Response generated and sent to the requesting web browser.



Keeping the above in mind, lets take a look at our FlightEnquiry.NET voice-only ASP.NET speech application.

Voice Only ASP.NET Speech Application
On an abstract level, the application works like this.

  1. The user calls the FlightEnquiry.NET application from her phone. The app picks up the phone, plays a welcome prompt, and asks the user for the Flight she wants to enquire about.

  2. The caller speaks the flight number.

  3. [...]

  4. [...]

  5. [...]

  6. [...]

  7. [...]

  8. [...]

  9. [...]

  10. [...]

  11. [...]

  12. The application tells the caller the status of the flight and the call is disconnected.



The reason I left out points 3 to 11 was because they are hidden from the caller and mainly involve processing done by the application. Lets now look at the technical details of the FlightEnquiry.NET speech demo.

Step 1:
When the phone call is received, Microsoft Speech Server answers the phone by playing a pre-recorded prompt. This prompt could be a recorded voice or that coming from a Text-to-Speech (TTS) Engine.
The FightEnquiry.NET application played the prompt, "Welcome to FlightEnquiry Dot Net. Please say the Airline Name and Flight Number."


Step 2:
The caller speaks the flight name and number she wants to enquire about.
Depending on how flexible your have built your app, the caller can say the Flight Name and Number in a number of ways; Pakistan International Airlines PK 347, Pakistan International Airlines 347, PIA 347, PK 347, PIA Flight 347, PIA Flight PK 347, etc.


Step 3:
The spoken input is validated against a grammar rule. All grammar rules exist inside a separate *.srgs file that contains the XML for the grammar. In order to allow the caller to answer in a number of possible ways, the grammar should take into consideration the maximum number of possibilities that the caller can say in response to a played prompt. The result of the grammar validation is plain XML within a SML (Semantic Markup Language) tag.
The resultant SML for the FlightEnquiry.NET app looks like this:

<SML>
<FlightNumber>PK347</FlightNumber>
</SML>



Step 4:
Values can be extracted from the generated SML using an XPath query that you specified during application development. The extracted value is then stored in an object variable called a semantic item. Each semantic item can hold only one value. A semantic item is simply a server control that can be shared between the browser and ASP.NET application.
The value "PK347" we got from the generated SML is stored in a semantic item called "siFlightNumber".


Step 5:
Once the value is stored in the semantic item, it is submitted to the server. The value of the semantic item can be accessed inside the browser using JavaScript through its value property, i.e. siFlightNumber.value. However, on the server, the semantic item's value is accessed using the Text property, i.e. siFlightNumber.Text.


Step 6:
As with any typical ASP.NET web application, the request is processed.


Step 7:
The database is queried for the flight's status.


Step 8:
Query results are returned back to the application.


Step 9:
The Flight status is now returned, again as a semantic item back to the client.


Step 10:
In order to play the Flight's status back to the caller, a client-side JavaScript code written at the time of development formats the status it receives in the form of a semantic item from the server. This JavaScript code resides inside a prompt function (*.pf) file that contains both XML and JavaScript, and converts the status into a form that can be spoken back to the caller.


Step 11:
The prompt function returns a string that would be spoken either by the TTS or would be generated from pre-recorded prompts and played to the caller i.e. Pakistan International Airlines Flight P K three four seven from Karachi to Lahore departing six thirty and arriving wight fourty is on schedule. Thank you for calling Flight Enquiry Dot Net.


Step 12:
The callers hears the TTS/prompt voice and the call ends.


Whew! Hopefully, I have been able to explain how my FlightEnquiry.NET demo works in this post. I would be covering the steps for speech-enabling the FlightEnquiry.NET application in my next post, the last of this series on developing speech application using Microsoft SASDK. Comments and feedback are always welcome.

7 comments:

aamir said...

great wrk done.
can u tell the difference between
Voice-only application and IVR(integrated voice response)
also can u send some books for developing speech application using .net

Adnan Farooq Hashmi said...

@ Aamir
Not much difference between an IVR and Voice-only Speech application. IVRs have traditionally relied on DTMF (Dual-Tone Multi-Frequency) or touch tone to receive input from the caller. With Voice-only Speech applications, you have the flexibility to get input either as DTMF or Voice commands spoken by the caller.

There is a book being written on Microsoft Speech Server (check one of my previous posts). You can also check out http://www.gotspeech.net to learn more. The book mentioned on GotSpeech.NET web site is "Building Intelligent .NET Applications".

Zeeshan Hasan said...

My question is how reliable speech recognition is ... (pros & cons) if it is good then why our mouce is not replace with systems like these ? The ones i tested i had to spend hours to configure it

Adnan Farooq Hashmi said...

@Zeeshan

At one point, systems did require a lot of training to get them to accurately recognize speech, and this did not prepare them to recognize other dialects/accents.

Speech recognition engines that come today, like the one from Microsoft, come with a lot of training data so the user does not have to train them again.

The speech recognition capability built into the upcoming Windows Vista is even better in that it maintains a speech profile for each user and learns more about the users' way of speaking as time goes by. From what I have seen in Windows Vista, it wont be long before we would be able to do a lot with voice commands in addition to using the mouse and keyboard.

Ayisha said...

nice site for more books I have some more gifts..



Reference books

Books

Kitaben

books

Reference Books

tutorials

rapidshare tutorials

rapidshare books

MZWorld

Upload Books

MZWorld Library

Books Forum

oscoda said...

companies marketing mineral makeups and also get the best bargains in mineral makeup you can imagine,
find aout how to consolidate your students loans or just how to lower your actual rates.,
looking for breast enlargements? in Rochester,
homeopathy for eczema learn about it.,
Allergies, information about lipitor,
save big with great bargains in mineral makeup,

change edition interviewing motivational people preparing second
,

interviewing motivational people preparing second time
,

interviewing people motivational preparing for a second time
,

black mold exposure
,

black mold exposure symptoms
,

black mold symptoms of exposure
,

free job interview questions
,

free job interview answers
,

interview answers to get a job
,

lookfor hair styles for fine thin hair
,

search hair styles for fine thin hair
,

hair styles for fine thin hair
,

beach resort in the philippines
,

great beach resort in the philippines
,

luxury beach resort in the philippines
,
iron garden gates, here,
iron garden gates,
wrought iron garden gates
, here
,
wrought iron garden gates
,
You: The Owner's Manual: An Insider's Guide to the Body That Will Make You Healthier and Younger
,
eat eating mindless more than think we we why
,


texturizer,
texturizers here,
black hair texturizer,
find aout how care curly hair,
find about how to care curly hair,
care curly hair,
lipitor rash,
lipitor reactions,
new house ventura california,
the house new houston tx,
new house washington dc,
new house pa philadelphia,
san antonio tx house new,
house new pa philadelphia,
new house washington dc,
new house ventura california,
the house new houston tx,
house new san antonio tx,
the house new houston tx, that you are looking for,
new house ventura california, you need to buy,
new house washington dc,
house new pa philadelphia,
new house san antonio tx,

hair surgery transplant
,

air filter allergy
,

refurbished dell laptop computers
,

hair surgery transplant
,

air filter allergy
,

refurbished dell laptop computers
,

hair surgery transplant
,

air filter allergy
,

refurbished dell laptop computers
,

chocolate esophagus heartburn study
,

chocolate esophagus heartburn study
be informed,

digestion healing healthy heartburn natural preventing way
,

digestion healing healthy heartburn natural preventing way
,
sew skirts, 16simple styles you can make!,
sew what skirts 16 simple styles you,
rebates and discounts on sunsetter awnings,
sunsetter awnings discounts and rebates,
discount on sunsetter awnings


truck and bus tires 12r 22.5, get the best price,
tires truck and bus 12r 22.5 best price,
tires truck bus tires12r 22.5 best price,
plush car seat strap covers,
car seat strap covers,plush,
car seat strap, plush covers,
oscoda voip phone systems, the best!,
oscoda voip the phone system,
oscoda voip phone systems,
exterior iron gates,
oriental wrought iron gates,
powder coated iron garden fencing,

iron gates said...

black mold exposure,
black mold symptoms of exposure,

wrought iron garden gates,
your next iron garden gates, here,

hair styles for fine thin hair,
search hair styles for fine thin hair,

night vision binoculars,
buy, night vision binoculars,

lipitor reactions,
lipitor reactions,

luxury beach resort in the philippines,
beach resort in the philippines,

homeopathy for baby eczema.,
homeopathy for baby eczema.,

save big with great mineral makeup bargains,
companies marketing mineral makeups,

prodam iphone praha,
Apple prodam iphone praha,

iphone clone cect manual,
manual for iphone clone cect,

fero 52 binoculars night vision,
fero 52 night vision,

best night vision binoculars,
buy, best night vision binoculars,

computer programs to make photo albums,
computer programs, make photo albums,