Multimodal Signal Processing (eBook)
352 Seiten
Elsevier Science (Verlag)
978-0-08-088869-9 (ISBN)
- Presents state-of-art methods for multimodal signal processing, analysis, and modeling
- Contains numerous examples of systems with different modalities combined
- Describes advanced applications in multimodal Human-Computer Interaction (HCI) as well as in computer-based analysis and modelling of multimodal human-human communication scenes.
Multimodal signal processing is an important research and development field that processes signals and combines information from a variety of modalities - speech, vision, language, text - which significantly enhance the understanding, modelling, and performance of human-computer interaction devices or systems enhancing human-human communication. The overarching theme of this book is the application of signal processing and statistical machine learning techniques to problems arising in this multi-disciplinary field. It describes the capabilities and limitations of current technologies, and discusses the technical challenges that must be overcome to develop efficient and user-friendly multimodal interactive systems.
With contributions from the leading experts in the field, the present book should serve as a reference in multimodal signal processing for signal processing researchers, graduate students, R&D engineers, and computer engineers who are interested in this emerging field.
- Presents state-of-art methods for multimodal signal processing, analysis, and modeling
- Contains numerous examples of systems with different modalities combined
- Describes advanced applications in multimodal Human-Computer Interaction (HCI) as well as in computer-based analysis and modelling of multimodal human-human communication scenes.
Multimodal signal processing is an important research and development field that processes signals and combines information from a variety of modalities - speech, vision, language, text - which significantly enhance the understanding, modelling, and performance of human-computer interaction devices or systems enhancing human-human communication. The overarching theme of this book is the application of signal processing and statistical machine learning techniques to problems arising in this multi-disciplinary field. It describes the capabilities and limitations of current technologies, and discusses the technical challenges that must be overcome to develop efficient and user-friendly multimodal interactive systems. With contributions from the leading experts in the field, the present book should serve as a reference in multimodal signal processing for signal processing researchers, graduate students, R&D engineers, and computer engineers who are interested in this emerging field. - Presents state-of-art methods for multimodal signal processing, analysis, and modeling- Contains numerous examples of systems with different modalities combined- Describes advanced applications in multimodal Human-Computer Interaction (HCI) as well as in computer-based analysis and modelling of multimodal human-human communication scenes.
Front Cover 1
Title Page 4
Copyright Page 5
Table of Contents 6
Preface 14
Chapter 1. Introduction 16
Part I: Signal Processing, Modelling and Related Mathematical Tools 20
Chapter 2. Statistical Machine Learning for HCI 22
2.1 Introduction 22
2.2 Introduction to Statistical Learning 23
2.2.1 Types of Problem 23
2.2.2 Function Space 24
2.2.3 Loss Functions 25
2.2.4 Expected Risk and Empirical Risk 25
2.2.5 Statistical Learning Theory 26
2.3 Support Vector Machines for Binary Classification 28
2.4 Hidden Markov Models for Speech Recognition 31
2.4.1 Speech Recognition 32
2.4.2 Markovian Processes 32
2.4.3 Hidden Markov Models 33
2.4.4 Inference and Learning with HMMs 35
2.4.5 HMMs for Speech Recognition 37
2.5 Conclusion 37
References 38
Chapter 3. Speech Processing 40
3.1 Introduction 41
3.2 Speech Recognition 43
3.2.1 Feature Extraction 43
3.2.2 Acoustic Modelling 45
3.2.3 Language Modelling 48
3.2.4 Decoding 49
3.2.5 Multiple Sensors 50
3.2.6 Confidence Measures 52
3.2.7 Robustness 53
3.3 Speaker Recognition 55
3.3.1 Overview 55
3.3.2 Robustness 58
3.4 Text-to-Speech Synthesis 59
3.4.1 Natural Language Processing for Speech Synthesis 59
3.4.2 Concatenative Synthesis with a Fixed Inventory 61
3.4.3 Unit Selection-Based Synthesis 65
3.4.4 Statistical Parametric Synthesis 68
3.5 Conclusions 71
References 72
Chapter 4. Natural Language and Dialogue Processing 78
4.1 Introduction 78
4.2 Natural Language Understanding 79
4.2.1 Syntactic Parsing 79
4.2.2 Semantic Parsing 83
4.2.3 Contextual Interpretation 85
4.3 Natural Language Generation 86
4.3.1 Document Planning 87
4.3.2 Microplanning 88
4.3.3 Surface Realisation 88
4.4 Dialogue Processing 89
4.4.1 Discourse Modelling 89
4.4.2 Dialogue Management 92
4.4.3 Degrees of Initiative 95
4.4.4 Evaluation 96
4.5 Conclusion 100
References 100
Chapter 5. Image and Video Processing Tools for HCI 108
5.1 Introduction 108
5.2 Face Analysis 109
5.2.1 Face Detection 110
5.2.2 Face Tracking 111
5.2.3 Facial Feature Detection and Tracking 113
5.2.4 Gaze Analysis 115
5.2.5 Face Recognition 116
5.2.6 Facial Expression Recognition 118
5.3 Hand-Gesture Analysis 119
5.4 Head Orientation Analysis and FoA Estimation 121
5.4.1 Head Orientation Analysis 121
5.4.2 Focus of Attention Estimation 122
5.5 Body Gesture Analysis 124
5.6 Conclusions 127
References 127
Chapter 6. Processing of Handwriting and Sketching Dynamics 134
6.1 Introduction 134
6.2 History of Handwriting Modality and the Acquisition of Online Handwriting Signals 136
6.3 Basics in Acquisition, Examples for Sensors 138
6.4 Analysis of Online Handwriting and Sketching Signals 139
6.5 Overview of Recognition Goals in HCI 140
6.6 Sketch Recognition for User Interface Design 143
6.7 Similarity Search in Digital Ink 148
6.8 Summary and Perspectives for Handwriting and Sketching in HCI 153
References 154
Part II: Multimodal Signal Processing and Modelling 158
Chapter 7. Basic Concepts of Multimodal Analysis 160
7.1 Defining Multimodality 160
7.2 Advantages of Multimodal Analysis 163
7.3 Conclusion 166
References 167
Chapter 8. Multimodal Information Fusion 168
8.1 Introduction 168
8.2 Levels of Fusion 171
8.3 Adaptive versus Non-Adaptive Fusion 173
8.4 Other Design Issues 177
8.5 Conclusions 180
References 180
Chapter 9. Modality Integration Methods 186
9.1 Introduction 186
9.2 Multimodal Fusion for AVSR 187
9.2.1 Types of Fusion 187
9.2.2 Multistream HMMs 189
9.2.3 Stream Reliability Estimates 189
9.3 Multimodal Speaker Localisation 193
9.4 Conclusion 196
References 196
Chapter 10. A Multimodal Recognition Framework for Joint Modality Compensation and Fusion 200
10.1 Introduction 201
10.2 Joint Modality Recognition and Applications 203
10.3 A New Joint Modality Recognition Scheme 206
10.3.1 Concept 206
10.3.2 Theoretical Background 206
10.4 Joint Modality Audio-Visual Speech Recognition 209
10.4.1 Signature Extraction Stage 211
10.4.2 Recognition Stage 212
10.5 Joint Modality Recognition in Biometrics 213
10.5.1 Overview 213
10.5.2 Results 214
10.6 Conclusions 218
References 219
Chapter 11 Managing Multimodal Data, Metadata and Annotations: Challenges and Solutions 222
11.1 Introduction 223
11.2 Setting the Stage: Concepts and Projects 223
11.2.1 Metadata versus Annotations 224
11.2.2 Examples of Large Multimodal Collections 225
11.3 Capturing and Recording Multimodal Data 226
11.3.1 Capture Devices 226
11.3.2 Synchronisation 227
11.3.3 Activity Types in Multimodal Corpora 228
11.3.4 Examples of Set-ups and Raw Data 228
11.4 Reference Metadata and Annotations 229
11.4.1 Gathering Metadata: Methods 230
11.4.2 Metadata for the AMI Corpus 231
11.4.3 Reference Annotations: Procedure and Tools 232
11.5 Data Storage and Access 234
11.5.1 Exchange Formats for Metadata and Annotations 234
11.5.2 Data Servers 236
11.5.3 Accessing Annotated Multimodal Data 237
11.6 Conclusions and Perspectives 238
References 239
Part III. Multimodal Human–Computer and Human-to-Human Interaction 244
Chapter 12. Multimodal Input 246
12.1 Introduction 246
12.2 Advantages of Multimodal Input Interfaces 247
12.2.1 State-of-the-Art Multimodal Input Systems 249
12.3 Multimodality, Cognition and Performance 252
12.3.1 Multimodal Perception and Cognition 252
12.3.2 Cognitive Load and Performance 253
12.4 Understanding Multimodal Input Behaviour 254
12.4.1 Theoretical Frameworks 255
12.4.2 Interpretation of Multimodal Input Patterns 258
12.5 Adaptive Multimodal Interfaces 260
12.5.1 Designing Multimodal Interfaces that Manage Users’ Cognitive Load 261
12.5.2 Designing Low-Load Multimodal Interfaces for Education 263
12.6 Conclusions and Future Directions 265
References 266
Chapter 13. Multimodal HCI Output: Facial Motion, Gestures and Synthesised Speech Synchronisation 272
13.1 Introduction 272
13.2 Basic AV Speech Synthesis 273
13.3 The Animation System 275
13.4 Coarticulation 278
13.5 Extended AV Speech Synthesis 279
13.5.1 Data-Driven Approaches 282
13.5.2 Rule-Based Approaches 284
13.6 Embodied Conversational Agents 285
13.7 TTS Timing Issues 287
13.7.1 On-the-Fly Synchronisation 287
13.7.2 A Priori Synchronisation 288
13.8 Conclusion 289
References 289
Chapter 14. Interactive Representations of Multimodal Databases 294
14.1 Introduction 294
14.2 Multimodal Data Representation 295
14.3 Multimodal Data Access 298
14.3.1 Browsing as Extension of the Query Formulation Mechanism 298
14.3.2 Browsing for the Exploration of the Content Space 302
14.3.3 Alternative Representations 307
14.3.4 Evaluation 307
14.3.5 Commercial Impact 308
14.4 Gaining Semantic from User Interaction 309
14.4.1 Multimodal Interactive Retrieval 309
14.4.2 Crowdsourcing 310
14.5 Conclusion and Discussion 313
References 314
Chapter 15. Modelling Interest in Face-to-Face Conversations from Multimodal Nonverbal Behaviour 324
15.1 Introduction 324
15.2 Perspectives on Interest Modelling 326
15.3 Computing Interest from Audio Cues 330
15.4 Computing Interest from Multimodal Cues 333
15.5 Other Concepts Related to Interest 335
15.6 Concluding Remarks 337
References 338
Index 342
Erscheint lt. Verlag | 11.11.2009 |
---|---|
Sprache | englisch |
Themenwelt | Sachbuch/Ratgeber |
Informatik ► Software Entwicklung ► User Interfaces (HCI) | |
Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik | |
Naturwissenschaften ► Physik / Astronomie ► Elektrodynamik | |
Technik ► Elektrotechnik / Energietechnik | |
Technik ► Nachrichtentechnik | |
ISBN-10 | 0-08-088869-0 / 0080888690 |
ISBN-13 | 978-0-08-088869-9 / 9780080888699 |
Haben Sie eine Frage zum Produkt? |
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich