Previous Table of Contents Next


Portability. Finally, for the technology to achieve widespread use, a system must be portable among computers. It must not rely on any special equipment for processing other than the chips for digitizing the analog signals from voice input, typically embedded in any computer to which a microphone may be attached. Some systems rely on the use of proprietary digital signal processing boards to both digitize the analog input and analyze the speech and convert it to digital components. This approach ties the application to a specific set of hardware.

CURRENT APPLICATIONS

The progress made in speech recognition technology means that developers of multimedia applications should seriously consider speech recognition in their human interface design. Two major classifications of applications exist:

1.  Microphone applications. The speaker talks into a microphone attached to a workstation and obtains spoken or visual responses (or both).
2.  Telephony applications. The speaker uses a telephone handset or equivalent to speak and hear responses.

Microphone-Input Applications for the PC

Microphone applications have the potential to provide impaired persons access to information facilities and to make people in many professions more productive. Because these applications are usually based on the older, speaker-dependent technology, many more of them have been implemented than have applications using speaker-independent recognition technology. Some examples of workstation or PC-based applications follow.

Kurzweil Applied Intelligence introduced its VoiceMED line of products for patient reporting almost 10 years ago. In these applications, the speaker (typically a physician who had trained the system) spoke into a microphone connected to a personal computer. The system translated the voice to written words and incorporated the specialized vocabulary of the physician. Extensions of the VoiceMED system were products such as VoicePATH (for pathology), VoiceEM (for emergency medicine), VoiceDIALYSIS (for kidney dialysis reporting), and VoiceCATH (for invasive cardiology). These systems grew out of Kurzweil’s earlier work in voice- controlled typing systems.

West Publishing and Kolvox Communications have introduced the LawTalk large vocabulary speech recognition front-end to the WestLaw online legal research system. The system includes Dragon Systems’ DragonDictate technology, which uses a microphone connection to the PC. The PC-based application then translates the speaker’s query and interfaces to the online database. Queries may be stated in either a formula- like Boolean expression or in natural language. The speech interface is further combined with a WordPerfect speech interface that allows users to transfer downloaded information to documents using oral commands.

Syracuse Language Systems, Inc.’s TriplePlay Plus! software uses speech recognition to teach foreign languages by listening to a student’s voice, evaluating the pronunciation, and replying in the foreign language used. The system can be used to learn French, German, English, or Spanish.

In the brokerage field, R.W. Pressprich Co., Inc., has implemented voice recognition to replace keyboard entry for bond traders.

Under the banner of its VoiceType Dictation systems, IBM offers dictating systems not only on its PCs but also with a PCMCIA digital signal adapter for laptop and notebook computers. VoiceType dictation is available in English, German, Italian, French, and Spanish. Other vendors have entered the market with similar products. One of the more powerful features of speech recognition technology is that because words are made up of phonetic tokens or subwords, the technology does not have to be changed to provide support to new languages; only the vocabularies need to be changed.

Telephony Applications

Telephony applications provide information access to large numbers of people and may potentially replace many of the telephone service personnel who now provide information, take orders by phone, or otherwise serve as a human interface to callers. Telephony applications are often successful because they provide an acceptable interface for callers who do not have, or prefer not to use, touchtone input to a voice response system. In fact, many new applications use existing interactive voice response (IVR) units to answer the call and provide the voice response once the caller’s speech is understood. Examples of such applications follow.

AT&T and most of the regional telephone companies in the U.S. are in the process of providing speaker- independent recognition interfaces for callers. The most profitable target for these applications is the hundreds of directory assistance operators who provide telephone numbers to 411 callers. However, the application of speech recognition technology is also being used to create other applications such as a third- party billing system and processing of collect calls.

Even speaker-dependent systems have a place in telephone carrier applications. Sprint is testing a FONCARD application in which the caller verbally enters the private access code and the system verifies both the number and the caller’s voice as valid for that code. Ameritech is testing a system that allows individual speakers at the same phone number to create speaker-dependent personal dialing directories.

Other industries may even be ahead of the telephone companies. One example is the customer service directory system implemented by Union Electric in St. Louis MO. This system permits customers with rotary phones or those who prefer not to use touchtone input to connect to such service groups as installation and billing and speak to a live operator or leave a voicemail message. The script between the IVR and the caller carefully directs the responses and allows either touchtone or voice response.


Previous Table of Contents Next

Copyright © CRC Press LLC