Human Factors and Voice Interactive Systems (eBook)
XXVI, 469 Seiten
Springer US (Verlag)
978-0-387-68439-0 (ISBN)
The second edition of Human Factors and Voice Interactive Systems, in addition to updating chapters from the first edition, adds in-depth information on current topics of major interest to speech application developers. These topics include use of speech technologies in automobiles, speech in mobile phones, natural language dialogue issues in speech application design, and the human factors design, testing, and evaluation of interactive voice response (IVR) applications.
Human Factors and Voice Interactive Systems, Second Edition provides in-depth information on current topics of major interest to speech application developers, and updates material from chapters that appeared in the previous edition.The first nine chapters of the book cover issues related to interactive voice response systems, including both mobile and multimodal device user interfaces as well as classic automated telephone systems. The remaining chapters cover special topics including synthetic speech and the design of speech applications to enhance accessibility to people with disabilities and the ever-growing population of older adults. Human Factors and Voice Interactive Systems, Second Edition is a collection of applied research and scholarly synthesis contributions by seasoned professionals in the field that highlight continuing efforts to study human interaction with speech technologies.
PREFACE 7
REFERENCES 13
ACKNOWLEDGEMENTS 15
Chapter 1 IVR USABILITY ENGINEERING USING GUIDELINES AND ANALYSES OF END- TO- END CALLS 26
1. IVR DESIGN PRINCIPLES AND GUIDELINES 27
1.1 A Taxonomy of Limitations of Speech User Interfaces 28
1.2 Towards Best Practices for IVR Design 35
1.3 Best Practices for IVR Design? 43
2. DATA-DRIVEN IVR USABILITY ENGINEERING BASED ON END- TO- END CALLS 44
2.1 The Flaws of Standard IVR Reports 45
2.2 Capturing End-to-End Data from Calls 45
2.3 Evaluating IVR Usability based on End-to-End Calls 48
2.4 Evaluating IVR Cost-effectiveness 54
3. SUMMARY AND CONCLUSIONS 62
ACKNOWLEDGEMENTS 64
REFERENCES 64
Chapter 2 User Interface Design for Natural Language Systems: From Research to Reality 67
1. INTRODUCTION 67
1.1 What is Natural Language? 67
1.2 What Are the Steps to Building a Natural Language Application? 70
1.3 When Does it Make Sense to Use Natural Language? 74
1.4 The Call Routing Task 78
1.5 Design Process 78
1.6 Analysis of Human-to-Human Dialogues 79
2. ANTHROPOMORPHISM AND USER EXPECTATIONS 79
2.1 Anthropomorphism Experiment 80
3. ISSUES FOR NATURAL DIALOGUE DESIGN 84
3.1 Initial Greeting 84
3.2 Confirmations 84
3.3 Disambiguating an Utterance 85
3.4 Reprompts 85
3.5 Turn-taking 86
3.6 When to Bail Out 86
4. ESTABLISHING USER EXPECTATIONS IN THE INITIAL GREETING 86
4.1 Initial Greeting Experiment 87
5. IDENTIFYING RECOGNITION ERRORS THROUGH CONFIRMATIONS 90
5.1 Confirming Digit Strings in Spoken Dialogue Systems 91
5.2 Confirmation of Topic in a Spoken Natural Dialogue System 93
6. REPAIRING RECOGNITION ERRORS WITH REPROMPTS 96
6.1 Reprompt Experiment 97
7. TURN-TAKING IN HUMAN-MACHINE DIALOGUES 100
7.1 Caller Tolerance of System Delay 101
8. SUMMARY 103
REFERENCES 103
Chapter 3 LINGUISTICS AND PSYCHOLINGUISTICS IN IVR DESIGN 105
1. INTRODUCTION 106
1.1 Speech Sounds 106
1.2 Grammar 107
2. ASR GRAMMARS AND LANGUAGE UNDERSTANDING 110
2.1 Morphology 111
2.2 Syntax 112
2.3 Semantics 117
2.4 Putting it All Together 118
2.5 ASR Grammars 119
2.6 Natural Language Understanding Models 121
3. DIALOG DESIGN 126
3.1 Putting it All Together 129
4. CONSEQUENCES OF STRUCTURAL SIMPLIFICATION 132
4.1 Semantic Specificity 135
4.2 Syntactic Specificity 136
CONCLUSION 137
REFERENCES 137
Chapter 4 DESIGNING THE VOICE USER INTERFACE FOR AUTOMATED DIRECTORY ASSISTANCE 140
1. THE BUSINESS OF DA 140
1.1 The Introduction of Automation 141
1.2 Early Attempts to Use Speech Recognition 142
2. ISSUES IN THE DESIGN OF VUI FOR DA 144
2.1 Addressing Database Inadequacies 145
2.2 Pronunciation of Names 146
2.3 The First Question 147
2.4 Finding the Locality 147
2.5 Confirming the Locality 148
2.6 Determining the Listing Type 149
2.7 Handling Business Requests 150
2.8 Handling Residential Listings 154
2.9 General Dialogue Design Issues 156
3. FINAL THOUGHTS 157
REFERENCES 157
Chapter 5 SPOKEN LANGUAGE INTERFACES FOR EMBEDDED APPLICATIONS 158
1. INTRODUCTION 158
2. SPOKEN LANGUAGE INTERFACES DEVELOPMENT 160
2.1 Overview. Current Trends 160
2.2 Embedded Speech Applications 162
3. EMBEDDED SPEECH TECHNOLOGIES 164
3.1 Technical Constraints and Implementation Methods 164
3.2 Embedded Speech Recognition 166
3.3 Embedded Speech Synthesis 172
4. A CASE STUDY: AN EMBEDDED TTS SYSTEM IMPLEMENTATION 176
4.1 A Simplified TTS System Architecture 176
4.2 Implementation Issues 178
5. THE FUTURE OF EMBEDDED SPEECH INTERFACES 181
REFERENCES 183
Chapter 6 SPEECH GENERATION IN MOBILE PHONES 185
1. INTRODUCTION 185
2. SPEAKING TELEPHONE? WHAT IS IT GOOD FOR? 187
3. SPEECH GENERATION TECHNOLOGIES IN MOBILE PHONES 188
3.1 Synthesis Technologies 189
3.2 Topic-Related Text Preprocessing 192
4. HOW TO PORT SPEECH SYNTHESIS ON A PHONE PLATFORM 200
5. LIMITATIONS AND POSSIBILITIES OFFERED BY PHONE RESOURCES 203
6. IMPLEMENTATIONS 205
6.1 The Mobile Phone as a Speaking Aid 205
6.2 An SMS-Reading Mobile Phone Application 208
ACKNOWLEDGEMENTS 212
REFERENCES 212
Chapter 7 VOICE MESSAGING USER INTERFACE 214
1. INTRODUCTION 214
2. THE TOUCH-TONE VOICE MAIL USER INTERFACE 217
2.1 Common Elements of Touch-tone Transactions 218
2.2 Call Answering 224
2.3 The Subscriber Interface 227
2.4 Retrieving and Manipulating Messages 227
2.5 Sending Messages 230
2.6 Voice Messaging User Interface Standards 232
2.7 Alternative Approaches to Traditional Touch-tone Design 235
3. AUTOMATIC SPEECH RECOGNITION AND VOICE MAIL 236
4. UNIFIED MESSAGING AND MULTIMEDIA MAIL 240
4.1 Fax Messaging 241
4.2 Viewing Voice Mail 242
4.3 Listening to E-mail 244
4.4 Putting it All Together 245
4.5 Mixed Media 246
REFERENCES 247
Chapter 8 SILENCE LOCATIONS AND DURATIONS IN DIALOG MANAGEMENT 251
1. INTRODUCTION 251
2. PROMPTS AND RESPONSES IN DIALOG MANAGEMENT 253
2.1 Dialog Management 253
2.2 Word Selection 254
2.3 Word Lists 254
2.4 Turn-Taking Cues 256
3. TIME AS AN INDEPENDENT VARIABLE – DIALOG MODEL 256
3.1 Definition of Terms 257
3.2 Examples of Usage 258
4. USER BEHAVIOR 258
4.1 Transactional Analysis 258
4.2 Verbal Communication 259
4.3 Directed Dialogs 259
5. MEASUREMENTS 260
5.1 Barge-In 261
6. USABILITY TESTING AND RESULTS 262
6.1 Test Results – United States (early prototype) 264
6.2 Test Results – United States (tuned, early prototype) 265
6.3 Test Results – United Kingdom 266
6.4 Test Results – Italy 267
6.5 Test Results – Denmark 269
7. OBSERVATIONS AND INTERPRETATIONS 270
7.1 Lateral Results 270
7.2 Learning – Longitudinal Results 271
CONCLUSIONS 272
ACKNOWLEDGEMENT 272
REFERENCES 272
Chapter 9 USING NATURAL DIALOGS AS THE BASIS FOR SPEECH INTERFACE DESIGN 274
1. INTRODUCTION 275
1.1 Motivation 275
1.2 Natural Dialog Studies 276
2. NATURAL DIALOG CASE STUDIES 277
2.1 Study #1: SpeechActs Calendar (speech-only, telephone- based) 278
2.2 Study #2: Office Monitor (speech-only, microphone-based) 283
2.3 Study #3: Automated Customer Service Representative ( speech input, speech/ graphical output, telephone- based) 288
2.4 Study #4: Multimodal Drawing ( speech/ mouse/ keyboard input, speech/graphical output, microphone- based) 297
3. DISCUSSION 305
3.1 Refining Application Requirements and Functionality 305
3.2 Collecting Appropriate Vocabulary 306
3.3 Determining Commonly Used Grammatical Constructs 306
3.4 Discovering Effective Interaction Patterns 306
3.5 Helping with Prompt and Feedback Design 307
3.6 Getting a Feeling for the Tone of the Conversations 307
CONCLUSION 308
ACKNOWLEDGEMENTS 308
REFERENCES 309
Chapter 10 TELEMATICS: ARTIFICIAL PASSENGER AND BEYOND 310
1. INTRODUCTION 310
2. A BRIEF OVERVIEW OF IBM VOICE TECHNOLOGIES 311
2.1 Conversational Interactivity for Telematics 312
2.2 System Architecture 314
2.3 Embedded Speech Recognition 316
2.4 Distributed Speech Recognition 318
3. EVALUATING/PREDICTING THE CONSEQUENCES OF MISRECOGNITIONS 319
4. IMPROVING VOICE AND STATE RECOGNITION PERFORMANCE – NETWORK DATA COLLECTION, LEARNING BY EXAMPLE, ADAPTATION OF LANGUAGE AND ACOUSTIC MODELS FOR SIMILAR USERS 322
5. ARTIFICIAL PASSENGER 327
6. USER MODELING ASPECTS 334
6.1 User Model 335
6.2 The Adaptive Modeling Process 336
6.3 The Control Process 337
6.4 Discussion about Time- Lagged Observables and Indicators in a History 338
7. GESTURE-BASED COMMAND INTERFACE 339
8. SUMMARY 341
ACKNOWLEDGEMENTS 342
REFERENCES 342
Chapter 11 A LANGUAGE TO WRITE LETTER-TO-SOUND RULES FOR ENGLISH AND FRENCH 345
1. INTRODUCTION 345
2. THE HISTORIC EVOLUTION OF ENGLISH A N D FRENCH 347
3. THE COMPLEXITY OF THE CONVERSION FOR ENGLISH AND FRENCH 347
4. RULE FORMALISM 352
5. EXAMPLES OF RULES FOR ENGLISH 358
6. EXAMPLES OF RULES FOR FRENCH 363
CONCLUSIONS 371
REFERENCES 372
APPENDICES FOR FRENCH 374
APPENDICES FOR ENGLISH 377
Chapter 12 VIRTUAL SENTENCES OF SPONTANEOUS SPEECH: BOUNDARY EFFECTS OF SYNTACTIC-SEMANTIC-PROSODIC PROPERTIES 379
1. INTRODUCTION 379
2. METHOD AND MATERIAL 382
2.1 Subjects 382
2.2 Speech Material 382
2.3 Procedure 383
3. RESULTS 384
3.1 Identification of Virtual Sentences in the Normal and Filtered Speech Samples 384
3.2 Pauses of the speech sample 386
3.3 Pause Perception 388
3.4 F0 Patterns 390
3.5 Comprehension of the Spontaneous Speech Sample 392
3.6 The Factor of Gender 393
Conclusions 393
ACKNOWLEDGEMENTS 395
REFERENCES 395
Chapter 13 TEXT-TO-SPEECH FORMANT SYNTHESIS FOR FRENCH 398
1. INTRODUCTION 398
2. GRAPHEME-TO-PHONEME CONVERSION 399
2.1 Normalization: From Grapheme to Grapheme 399
2.2 From Grapheme to Phoneme 401
2.3 Exception Dictionary 402
3. PROSODY 402
3.1 Parsing the Text 402
3.2 Intonation 403
3.3 Phoneme Duration 408
4. ACOUSTICS FOR FRENCH CONSONANTS AND VOWELS 415
4.1 Vowels 415
4.2 Fricatives (unvoiced:F,S,Ch voiced: V,Z,J)
4.3 Plosives (unvoiced:P,T,K voiced: B,D,G)
4.4 Nasals (M, N, Gn, Ng) 420
4.5 Liquids (L, R) 421
4.6 Semivowels (Y, W, Wu) 422
4.7. Phoneme Transitions (coarticulation effects) 422
4.8 Frame Generation 426
4.9 Conclusions for acoustics 426
5. FROM ACOUSTICS TO SPEECH SIGNAL 427
6. NEXT GENERATION FORMANT SYNTHESIS 429
7. SINGING 431
CONCLUSIONS 431
REFERENCES 432
Chapter 14 ACCESSIBILITY AND SPEECH TECHNOLOGY: ADVANCING TOWARD UNIVERSAL ACCESS 434
1. UNIVERSAL ACCESS VS. ASSISTIVE TECHNOLOGY 434
2. PREDICTED ENHANCEMENTS AND IMPROVEMENTS TO UNDERLYING TECHNOLOGY 436
2.1 Social Network Analysis, Blogs, Wikis, and Social Computing 437
2.2 Intelligent Agents 438
2.3 Learning Objects 439
2.4 Cognitive Aids 440
2.5 Interface Flexibility and Intelligence 440
3. CURRENT ASSISTIVE TECHNOLOGY APPLICATIONS EMPLOYING SPEECH TECHNOLOGY 440
3.1 Applications Employing Automatic Speech Recognition ( ASR) 441
3.2 Applications of Synthetic Speech 445
4. HUMAN-COMPUTER INTERACTION: DESIGN AND EVALUATION 447
5. THE ROLE OF TECHNICAL STANDARDS IN ACCESSIBILITY 450
5.1 Standards Related to Software and Information Technology User Interfaces 451
5.2 Speech Application Accessibility Standards 451
5.3 Accessibility Data and Accessibility Guidance for General Products 454
CONCLUSIONS 456
REFERENCES 457
Chapter 15 SYNTHESIZED SPEECH USED FOR THE EVALUATION OF CHILDREN’S HEARING AND SPEECH PERCEPTION 460
1. INTRODUCTION 460
2. THE BACKGROUND THEORY 461
3. THE PRODUCTION OF THE SYNTHESIZED WORD MATERIAL 464
4. PRE-EXPERIMENTS FOR THE APPLICATION OF SYNTHESIZED WORDS FOR HEARING SCREENING 466
5. RESULTS 467
5.1 Clinical Tests 467
5.2 Screening Procedure 470
5.3 Evaluation of Acoustic-phonetic Perception 473
5.4 Children with Specific Needs 474
CONCLUSIONS 475
ACKNOWLEDGEMENTS 476
REFERENCES 476
INDEX 477
Erscheint lt. Verlag | 3.12.2007 |
---|---|
Reihe/Serie | Signals and Communication Technology | Signals and Communication Technology |
Zusatzinfo | XXVI, 469 p. |
Verlagsort | New York |
Sprache | englisch |
Themenwelt | Informatik ► Software Entwicklung ► User Interfaces (HCI) |
Technik ► Elektrotechnik / Energietechnik | |
Schlagworte | Gardner-Bonneau • Human • Human Factors • Interactive • interactive system • Systems • Usability • user interface • Voice • Voice Recognition |
ISBN-10 | 0-387-68439-5 / 0387684395 |
ISBN-13 | 978-0-387-68439-0 / 9780387684390 |
Haben Sie eine Frage zum Produkt? |
Größe: 8,7 MB
DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich