Emotion Recognition using Speech Features - K. Sreenivasa Rao, Shashidhar G. Koolagudi

Emotion Recognition using Speech Features (eBook)

eBook Download: PDF
2012 | 2013
XII, 124 Seiten
Springer New York (Verlag)
978-1-4614-5143-3 (ISBN)
Systemvoraussetzungen
53,49 inkl. MwSt
  • Download sofort lieferbar
  • Zahlungsarten anzeigen
'Emotion Recognition Using Speech Features' provides coverage of emotion-specific features present in speech. The author also discusses suitable models for capturing emotion-specific information for distinguishing different emotions.  The content of this book is important for designing and developing  natural and sophisticated speech systems. In this Brief, Drs. Rao and Koolagudi lead a discussion of how emotion-specific information is embedded in speech and how to acquire emotion-specific knowledge using appropriate statistical models. Additionally, the authors provide information about exploiting multiple evidences derived from various features and models. The acquired emotion-specific knowledge is useful for synthesizing emotions. Features includes discussion of: • Global and local prosodic features at syllable, word and phrase levels, helpful for capturing emotion-discriminative information; • Exploiting complementary evidences obtained from excitation sources, vocal tract systems and prosodic features in order to enhance the emotion recognition performance; • Proposed multi-stage and hybrid models for improving the emotion recognition performance. This brief is for researchers working in areas related to speech-based products such as mobile phone manufacturing companies, automobile companies, and entertainment products as well as researchers involved in basic and applied speech processing research.

K. Sreenivasa Rao is at the Indian Institute of Technology, Kharagpur, India.
Shashidhar G, Koolagudi is at the Graphic Era University, Dehradun, India.
"e;Emotion Recognition Using Speech Features"e; provides coverage of emotion-specific features present in speech. The author also discusses suitable models for capturing emotion-specific information for distinguishing different emotions. The content of this book is important for designing and developing natural and sophisticated speech systems. In this Brief, Drs. Rao and Koolagudi lead a discussion of how emotion-specific information is embedded in speech and how to acquire emotion-specific knowledge using appropriate statistical models. Additionally, the authors provide information about exploiting multiple evidences derived from various features and models. The acquired emotion-specific knowledge is useful for synthesizing emotions. Features includes discussion of: Global and local prosodic features at syllable, word and phrase levels, helpful for capturing emotion-discriminative information; Exploiting complementary evidences obtained from excitation sources, vocal tract systems and prosodic features in order to enhance the emotion recognition performance; Proposed multi-stage and hybrid models for improving the emotion recognition performance. This brief is for researchers working in areas related to speech-based products such as mobile phone manufacturing companies, automobile companies, and entertainment products as well as researchers involved in basic and applied speech processing research.

K. Sreenivasa Rao is at the Indian Institute of Technology, Kharagpur, India.Shashidhar G, Koolagudi is at the Graphic Era University, Dehradun, India.

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Emotion: Psychological perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Emotion: Speech signal perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2.1 Speech production mechanism . . . . . . . . . . . . . . . . . . . . . . . . . 41.2.2 Source features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2.3 System features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2.4 Prosodic features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Emotional speech databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Applications of speech emotion recognition . . . . . . . . . . . . . . . . . . . . 9
1.5 Issues in speech emotion recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Objectives and scope of the work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.7 Main highlights of research investigations . . . . . . . . . . . . . . . . . . . . . . 12
1.8 Brief overview of contributions to this book . . . . . . . . . . . . . . . . . . . . 12
1.8.1 Emotion recognition using excitation source information . . . 12
1.8.2 Emotion recognition using vocal tract information . . . . . . . . . 12
1.8.3 Emotion recognition using prosodic information . . . . . . . . . . 13
1.9 Organization of the book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1

Emotion: Psychological perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Emotion: Speech signal perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Speech production mechanism . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Source features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 System features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.4 Prosodic features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Emotional speech databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Applications of speech emotion recognition . . . . . . . . . . . . . . . . . . . . 9
1.5 Issues in speech emotion recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Objectives and scope of the work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.7 Main highlights of research investigations . . . . . . . . . . . . . . . . . . . . . . 12
1.8 Brief overview of contributions to this book . . . . . . . . . . . . . . . . . . . . 12
1.8.1 Emotion recognition using excitation source information . . . 12
1.8.2 Emotion recognition using vocal tract information . . . . . . . . . 12
1.8.3 Emotion recognition using prosodic information . . . . . . . . . . 13
1.9 Organization of the book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Emotion: Psychological perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Emotion: Speech signal perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Speech production mechanism . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Source features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 System features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.4 Prosodic features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Emotional speech databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Applications of speech emotion recognition . . . . . . . . . . . . . . . . . . . . 9
1.5 Issues in speech emotion recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Objectives and scope of the work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.7 Main highlights of research investigations . . . . . . . . . . . . . . . . . . . . . . 12
1.8 Brief overview of contributions to this book . . . . . . . . . . . . . . . . . . . . 12
1.8.1 Emotion recognition using excitation source information . . . 12
1.8.2 Emotion recognition using vocal tract information . . . . . . . . . 12
1.8.3 Emotion recognition using prosodic information . . . . . . . . . . 13
1.9 Organization of the book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Emotion: Speech signal perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Speech production mechanism . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Source features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 System features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.4 Prosodic features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Emotional speech databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Applications of speech emotion recognition . . . . . . . . . . . . . . . . . . . . 9
1.5 Issues in speech emotion recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Objectives and scope of the work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.7 Main highlights of research investigations . . . . . . . . . . . . . . . . . . . . . . 12
1.8 Brief overview of contributions to this book . . . . . . . . . . . . . . . . . . . . 12
1.8.1 Emotion recognition using excitation source information . . . 12
1.8.2 Emotion recognition using vocal tract information . . . . . . . . . 12
1.8.3 Emotion recognition using prosodic information . . . . . . . . . . 13
1.9 Organization of the book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Speech Emotion Recognition: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Emotional speech corpora: A review. . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Excitation source features: A review . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4 Vocal tract system features: A review . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.5 Prosodic features: A review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.6 Classification models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.7 Motivation for the present work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.8 Summary of the literature and scope for the present work . . . . . . . . . 31. . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Emotional speech corpora: A review. . . . . . . . . . . . . . . . . . . . . . . . . . . 18
>2.3 Excitation source features: A review . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4 Vocal tract system features: A review . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.5 Prosodic features: A review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.6 Classification models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.7 Motivation for the present work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.8 Summary of the literature and scope for the present work . . . . . . . . . 31
3 Emotion Recognition using Excitation Source Information . . . . . . . . . . 33
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34. . . . . . . . . 33
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

viii Contents
3.3 Emotional speech corpora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3.1 Indian Institute of Technology Kharagpur-Simulated
Emotional Speech Corpus: IITKGP-SESC . . . . . . . . . . . . . . . 38
3.3.2 Berlin Emotional Speech Database: Emo-DB . . . . . . . . . . . . . 40
3.4 Excitation source features for emotion recognition . . . . . . . . . . . . . . . 40
3.4.1 Higher-order relations among LP residual samples . . . . . . . . 41
3.4.2 Phase of LP residual signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4.3 Parameters of the instants of glottal closure (Epoch
parameters) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.4.4 Dynamics of epoch parameters at syllable level . . . . . . . . . . . 48
3.4.5 Dynamics of epoch parameters at utterance level . . . . . . . . . 49
3.4.6 Glottal pulse parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.5 Classification models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.5.1 Auto-associative neural networks . . . . . . . . . . . . . . . . . . . . . . . 50
3.5.2 Support vector machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.6 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4 Emotion Recognition using Vocal Tract Information . . . . . . . . . . . . . . . 67
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.2 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2.1 Linear prediction cepstral coefficients (LPCCs) . . . . . . . . . . . 69
4.2.2 Mel frequency cepstral coefficients (MFCCs) . . . . . . . . . . . . . 70
4.2.3 Formant features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.3 Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.3.1 Gaussian mixture models (GMM) . . . . . . . . . . . . . . . . . . . . . . 73
4.4 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
. . . . . . . . . . . . . . 67
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.2 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2.1 Linear prediction cepstral coefficients (LPCCs) . . . . . . . . . . . 69
4.2.2 Mel frequency cepstral coefficients (MFCCs) . . . . . . . . . . . . . 70
4.2.3 Formant features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.3 Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.3.1 Gaussian mixture models (GMM) . . . . . . . . . . . . . . . . . . . . . . 73
4.4 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5 Emotion Recognition using Prosodic Information . . . . . . . . . . . . . . . . . 815.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.2 Prosodic features: importance in emotion recognition . . . . . . . . . . . . 82
5.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.4 Extraction of global and local prosodic features . . . . . . . . . . . . . . . . . 86
5.5 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
. . . . . . . . . . . . . . . . 81
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.2 Prosodic features: importance in emotion recognition . . . . . . . . . . . . 82
5.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.4 Extraction of global and local prosodic features . . . . . . . . . . . . . . . . . 86
5.5 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.1 Summary of the present work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.2 Contributions of the present work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.3 Conclusions from the present work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.4 Scope for future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.1 Summary of the present work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.2 Contributions of the present work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.3 Conclusions from the present work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.4 Scope for future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
A Linear Prediction Analysis of Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
A.1 The Prediction Error Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
A.2 Estimation of Linear Prediction Coefficients . . . . . . . . . . . . . . . . . . . . 103
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
A.1 The Prediction Error Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
A.2 Estimation of Linear Prediction Coefficients . . . . . . . . . . . . . . . . . . . . 103
Contents ix
B MFCC Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
C Gaussian Mixture Model (GMM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
C.1 Training the GMMs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
C.1.1 Expectation Maximization (EM) Algorithm . . . . . . . . . . . . . . 112
C.1.2 Maximum a posteriori (MAP) Adaptation . . . . . . . . . . . . . . . 113
C.2 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
C.1 Training the GMMs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
C.1.1 Expectation Maximization (EM) Algorithm . . . . . . . . . . . . . . 112
C.1.2 Maximum a posteriori (MAP) Adaptation . . . . . . . . . . . . . . . 113C.2 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
a posteriori (MAP) Adaptation . . . . . . . . . . . . . . . 113
C.2 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

Erscheint lt. Verlag 7.11.2012
Reihe/Serie SpringerBriefs in Speech Technology
SpringerBriefs in Speech Technology
Zusatzinfo XII, 124 p. 30 illus., 6 illus. in color.
Verlagsort New York
Sprache englisch
Themenwelt Geisteswissenschaften Sprach- / Literaturwissenschaft Sprachwissenschaft
Informatik Software Entwicklung User Interfaces (HCI)
Informatik Theorie / Studium Künstliche Intelligenz / Robotik
Technik Elektrotechnik / Energietechnik
Schlagworte Emotion-discriminative Information • emotion recognition • Emotion-specific Information • Excitation Source Features • Excitation Source Information • Global/Local Prosodic Features • Speech Processing Emotions • Vocal Tract Information
ISBN-10 1-4614-5143-4 / 1461451434
ISBN-13 978-1-4614-5143-3 / 9781461451433
Haben Sie eine Frage zum Produkt?
PDFPDF (Wasserzeichen)
Größe: 2,0 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasser­zeichen und ist damit für Sie persona­lisiert. Bei einer missbräuch­lichen Weiter­gabe des eBooks an Dritte ist eine Rück­ver­folgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seiten­layout eignet sich die PDF besonders für Fach­bücher mit Spalten, Tabellen und Abbild­ungen. Eine PDF kann auf fast allen Geräten ange­zeigt werden, ist aber für kleine Displays (Smart­phone, eReader) nur einge­schränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich
Eine praxisorientierte Einführung mit Anwendungen in Oracle, SQL …

von Edwin Schicker

eBook Download (2017)
Springer Vieweg (Verlag)
34,99
Unlock the power of deep learning for swift and enhanced results

von Giuseppe Ciaburro

eBook Download (2024)
Packt Publishing (Verlag)
35,99