Computing Prosody

Computational Models for Processing Spontaneous Speech

Yoshinori Sagisaka, Nick Campbell, Norio Higuchi (Herausgeber)

Buch | Hardcover

401 Seiten

1996
Springer-Verlag New York Inc.
978-0-387-94804-1 (ISBN)

Titel ist leider vergriffen;
keine Neuauflage

Artikel merken

Presents computational analysis and modeling of spontaneous speech. This book gives theoretical background and discusses how spontaneous speech differs from the laboratory-speech. It focuses on prosody and structure of spoken message, generation and modeling of prosody for speech synthesis, and prosodic information in automatic speech recognition.

This book presents a collection of papers from the Spring 1995 Work- shop on Computational Approaches to Processing the Prosody of Spon- taneous Speech, hosted by the ATR Interpreting Telecommunications Re- search Laboratories in Kyoto, Japan. The workshop brought together lead- ing researchers in the fields of speech and signal processing, electrical en- gineering, psychology, and linguistics, to discuss aspects of spontaneous speech prosody and to suggest approaches to its computational analysis and modelling. The book is divided into four sections. Part I gives an overview and theoretical background to the nature of spontaneous speech, differentiating it from the lab-speech that has been the focus of so many earlier analyses. Part II focuses on the prosodic features of discourse and the structure of the spoken message, Part ilIon the generation and modelling of prosody for computer speech synthesis. Part IV discusses how prosodic information can be used in the context of automatic speech recognition. Each section of the book starts with an invited overview paper to situate the chapters in the context of current research.
We feel that this collection of papers offers interesting insights into the scope and nature of the problems concerned with the computational analysis and modelling of real spontaneous speech, and expect that these works will not only form the basis of further developments in each field but also merge to form an integrated computational model of prosody for a better understanding of human processing of the complex interactions of the speech chain.

Preface.- Contributors.- I The Prosody of Spontaneous Speech.- 1 Introduction to Part I.- 1.1 Naturalness and Spontaneous Speech.- References.- 2 A Typology of Spontaneous Speech.- 2.1 Introduction.- 2.2 Some Prosodic Phenomena.- 2.3 Types of Spontaneous Speech Recordings.- References.- 3 Prosody, Models, and Spontaneous Speech.- 3.1 What is Prosody? Its Nature and Function.- 3.2 Prosody in the Production of Spontaneous Speech.- 3.3 Role of Generative Models.- 3.4 A Generative Model for the F0 Contour of an Utterance of Japanese.- 3.5 Units of Prosody of the Spoken Japanese.- 3.6 Prosody of Spontaneous Speech.- References.- 4 On the Analysis of Prosody in Interaction.- 4.1 Introduction.- 4.2 Background Work.- 4.3 Goal and Methodology.- 4.4 Prosody in Language Technology.- 4.5 Analysis of Discourse and Dialogue Structure.- 4.6 Prosodic Analysis.- 4.6.1 Auditory Analysis.- 4.6.2 The Intonation Model.- 4.6.3 Acoustic-phonetic Analysis.- 4.7 Speech Synthesis.- 4.7.1 Model-based Resynthesis.- 4.7.2 Text-to-speech.- 4.8 Tentative Findings.- 4.9 Final Remarks.- References.- II Prosody and the Structure of the Message.- 5 Introduction to Part II.- 5.1 Prosody and the Structure of the Message.- References.- 6 Integrating Prosodic and Discourse Modelling.- 6.1 Introduction.- 6.2 Modelling Attentional State.- 6.3 Accent and Attentional Modelling.- 6.3.1 Principles.- 6.3.2 Algorithms.- 6.4 Related Work.- References.- 7 Prosodic Features of Utterances in Task-Oriented Dialogues.- 7.1 Introduction.- 7.2 Speech Data Collection.- 7.3 Framework for Analysis.- 7.4 Topic Structure and Utterance Pattern.- 7.4.1 Topic Shifting and Utterance Relation.- 7.4.2 Dialogue Structure and Pitch Contour.- 7.4.3 Topic Shifting and Utterance Pattern.- 7.4.4 Topic Shifting and Utterance Duration.- 7.5 Summary and Application.- 7.5.1 Summary of Results.- 7.5.2 Prosodic Parameter Generation.- References.- 8 Variation of Accent Prominence within the Phrase: Models and Spontaneous Speech Data.- 8.1 Introduction.- 8.2 F0 and Variation of Accent Prominence.- 8.2.1 Intrinsic Prominence of Single Accents.- 8.2.2 Relative Prominence of Successive Accents.- 8.2.3 Discussion.- 8.3 Variation of Accent Prominence in Spontaneous Speech..- 8.3.1 Introduction.- 8.3.2 Method.- 8.3.3 Data Analysis.- 8.3.4 Results and Discussion.- 8.3.5 Limitations.- References.- 9 Predicting the Intonation of Discourse Segments from Examples in Dialogue Speech.- 9.1 Introduction.- 9.2 Modelling Discourse Intonation.- 9.3 Analysis with ToBI Labels.- 9.4 Analysis with Tilt Labels.- 9.5 Discussion.- 9.6 Summary.- References.- 10 Effects of Focus on Duration and Vowel Formant Frequency in Japanese.- 10.1 Introduction.- 10.1.1The Aim of the Study.- 10.1.2Accent and Focus in Japanese.- 10.2 Experimental Setting.- 10.3 Results of Acoustic Analysis.- 10.3.1 F0 Peaks.- 10.3.2Utterance Duration.- 10.3.3 Formant Frequencies.- 10.3.4 Target Vowels.- 10.3.5 Context Vowels.- 10.4 Discussion.- 10.4.1 Duration.- 10.4.2 Target Vowels.- 10.4.3 Context Vowels.- References.- III Prosody in Speech Synthesis.- 11 Introduction to Part III.- 11.1 No Future for Comprehensive Models of Intonation?.- 11.2 Learning from Examples.- 11.2.1 The Reference Corpus.- 11.2 2 Labelling the Corpus.- 11.2 3 The Sub-Symbolic Paradigm: Training an Associator.- 11.2.4 The Morphological Paradigm.- References.- 12 Synthesizing Spontaneous Speech.- 12.1 Introduction.- 12.1.1 Synthesizing Speech.- 12.1.2 Natural Speech.- 12.2Spontaneous Speech.- 12.2.1 Spectral Correlates of Prosodie Variation.- 12.3 Labelling Speech.- 12.3.1 Automated Segmental Labelling.- 12.3.2 Automating Prosodie Labelling.- 12.3.3 Labelling Interactive Speech.- 12.4 Synthesis in CHATR.- 12.5 Summary.- References.- 13 Modelling Prosody in Spontaneous Speech.- 13.1 Introduction.- 13.2 A Prosodie Phonology of German: The Kiel Intonation Model (KIM).- 13.2.1 The Categories of the Model and its General Structure.- 13.2.2 Lexical and Sentence Stress.- 13.2.3 Intonation.- 13.2.4 Prosodie Boundaries.- 13.2.5 Speech Rate.- 13.2.6 Register Change.- 13.2.7 Dysfluencies.- 13.3 A TTS Implementation of the Model as a Prosody Research Tool.- 13.4 The Analysis of Spontaneous Speech.- 13.4.1 PROLAB: A KIM-based Labelling System.- 13.4.2 Transcription Verification and Model Elaboration.- References.- 14 Comparison of F0 Control Rules Derived from Multiple Speech Databases.- 14.1 Introduction.- 14.2 Derivation of F0 Control Rules and Their Comparison.- 14.2.1 Overview of the Rule Derivation Procedure.- 14.2.2 F0 Contour Decomposition.- 14.2.3 Statistical Rule Derivation.- 14.3 Experiments of F0 Control Rule Derivation and Their Comparison.- 14.3.1 Speech Data and Conditions of Parameter Extraction.- 14.3.2 Linguistic Factors For the Control Rules.- 14.4 Results.- 14.4.1 The Accuracy of the F0 Control Rules.- 14.4.2 Comparison of F0 Control Rules Among Multi-Speakers.- 14.4.3 Differences ofF0 Control Rules Between Different Speech Rates.- 14.5 Summary.- References.- 15 Segmental Duration and Speech Timing.- 15.1 Introduction.- 15.1.1 Modelling of Speech Timing.- 15.1.2 Goals of this Chapter.- 15.2 Template Based Timing: Path Equivalence.- 15.3 Measuring Subsegmental Effects.- 15.3.1 Trajectories, Time Warps, and Expansion Profiles.- 15.3.2 Preliminary Results.- 15.3.3 Modelling Time Warp Functions.- 15.4 Syllabic Timing vs Segmental Timing.- 15.4.1 The Concept of Syllabic Timing.- 15.4.2 Testing Segmental Independence.- 15.4.3 Testing Syllabic Mediation.- 15.4.4 Syllabic Timing: Conclusions.- 15.5 Timing of Pitch Contours.- 15.5.1 Modelling Segmental Effects on Pitch Contours: Initial Approach.- 15.5.2 Alignment Parameters and Time Warps.- 15.5.3 Modelling Segmental Effects on Pitch Contours: A Complete Model.- 15.5.4 Summary.- References.- 16 Measuring temporal compensation effect in speech perception.- 16.1 Introduction.- 16.1.1 Processing Range in Time Perception of Speech..- 16.1.2 Contextual Effect on Perceptual Salience of Temporal Markers.- 16.2 Experiment 1-Acceptability Rating.- 16.2.1 Method.- 16.2.2 Results and Discussion.- 16.3 Experiment 2-Detection Test.- 16.3.1 Method.- 16.3.2 Results and Discussion.- References.- 17 Prediction of Major Phrase Boundary Location and Pause Insertion Using a Stochastic Context-free Grammar.- 17.1 Introduction.- 17.2 Models for the Prediction of Major Phrase Boundary Locations and Pause Locations.- 17.2.1 Speech Data.- 17.2.2 Learning Major Phrase Boundary Locations and Pause Locations Using a SCFG.- 17.2.3 Computation of Parameters for the Prediction Using a SCFG.- 17.2.4 Prediction Model Using a Neural Network..- 17.3 Experiments.- 17.3.1 Learning the SCFG.- 17.3.2 Accuracy of the Prediction.- References.- IV Prosody in Speech Recognition.- 18 Introduction to Part IV.- 18.1 The Beginnings of Understanding.- 19 A Multi-level Model for Recognition of Intonation Labels.- 19.1 Introduction.- 19.2 Tone Label Model.- 19.2.1 Multi-level Model.- 19.2.2 Acoustic Models.- 19.2.3 Phonotactic Models.- 19.3 Recognition Search.- 19.4 Experiments.- 19.5 Discussion.- References.- 20 Training Prosody-Syntax Recognition Models without Prosodic Labels.- 20.1 Introduction.- 20.2 Speech Data and Analysis.- 20.2.1 Speech Data.- 20.2.2 Acoustic Feature Set.- 20.2.3 Syntactic Feature Set.- 20.3 Prosody-Syntax Models.- 20.3.1 Background.- 20.3.2 Break Index Linear Regression Model.- 20.3.3 CCA Model.- 20.3.4 LDA Model.- 20.4 Results and Analysis.- 20.4.1 Criterion 1: Resolving Syntactic Ambiguities.- 20.4.2 Criterion 2: Correlation of Acoustic and Syntactic Domains.- 20.4.3 Criterion 3: Internal Model Characteristics.- 20.5 Discussion.- References.- 21 Disambiguating Recognition Results by Prosodic Features.- 21.1 Introduction.- 21.2 Outline of the Method.- 21.2.1 Model for the F0 Contour Generation.- 21.2.2 Partial Analysis-by-synthesis.- 21.3 Experiments on the Detection of Recognition Errors.- 21.4 Performance in the Detection of Phrase Boundaries.- References.- 22 Accent Phrase Segmentation by F0 Clustering Using Superpositional Modelling.- 22.1 Introduction.- 22.2 Outline of Prosodic Segmentation System.- 22.3 Training of F0 Templates.- 22.3.1 Modelling of Minor Phrase Patterns.- 22.3.2 Clustering of Minor Phrase Patterns.- 22.4 Prosodic Phrase Segmentation.- 22.4.1 One-Stage DP Matching under a Constraint of the F0 Generation Model.- 22.4.2 N-best Search.- 22.5 Evaluation of Segmentation System.- 22.5.1 Experimental Condition.- 22.5.2 Results.- References.- 23 Prosodic Modules for Speech Recognition and Understanding in VERBMOBIL.- 23.1 What Can Prosody Do for Automatic Speech Recognition and Understanding?.- 23.2 A Few Words About VERBMOBIL.- 23.3 Prosody Module for the VERBMOBIL Research Prototype.- 23.3.1 Work on Read Speech.- 23.3.2 Work on Spontaneous Speech.- 23.4 Interactive Incremental Module.- 23.4.1 F0 Interpolation and Decomposition.- 23.4.2 Detecting Accents and Phrase Boundaries, and Determining Sentence Mode.- 23.4.3 Strategies for Focal Accent Detection.- References.- Author Index.- Citation Index.

Zusatzinfo	38 black & white tables, biography
Verlagsort	New York, NY
Sprache	englisch
Gewicht	790 g
Einbandart	gebunden
Themenwelt	Geisteswissenschaften ► Sprach- / Literaturwissenschaft ► Sprachwissenschaft
	Mathematik / Informatik ► Informatik
	Naturwissenschaften ► Physik / Astronomie
ISBN-10	0-387-94804-X / 038794804X
ISBN-13	978-0-387-94804-1 / 9780387948041
Zustand	Neuware