Speech Dereverberation (eBook)

eBook Download: PDF
2010 | 2010
XVIII, 388 Seiten
Springer London (Verlag)
978-1-84996-056-4 (ISBN)

Lese- und Medienproben

Speech Dereverberation -
Systemvoraussetzungen
149,79 inkl. MwSt
  • Download sofort lieferbar
  • Zahlungsarten anzeigen

Speech Dereverberation gathers together an overview, a mathematical formulation of the problem and the state-of-the-art solutions for dereverberation.

Speech Dereverberation presents current approaches to the problem of reverberation. It provides a review of topics in room acoustics and also describes performance measures for dereverberation. The algorithms are then explained with mathematical analysis and examples that enable the reader to see the strengths and weaknesses of the various techniques, as well as giving an understanding of the questions still to be addressed. Techniques rooted in speech enhancement are included, in addition to a treatment of multichannel blind acoustic system identification and inversion. The TRINICON framework is shown in the context of dereverberation to be a generalization of the signal processing for a range of analysis and enhancement techniques.

Speech Dereverberation is suitable for students at masters and doctoral level, as well as established researchers.



Patrick A. Naylor has a PhD in Speech Signal Processing from Imperial College London, where he is currently Reader and Director of Postgraduate Studies for the Department of Electrical and Electronic Engineering. His research interests include speech and audio signal processing; adaptive signal processing; speech enhancement in telecommunications; hands-free functionality; blind SIMO/MIMO channel estimation and dereverberation; speaker identification and verification; and speech production modelling. He is on the IEEE Technical Committee on Audio and Electroacoustics and is Associate Editor of the IEEE Transactions on Audio Speech and Language Processing.

Nikolay D. Gaubitch has a PhD in Acoustic Signal Processing from Imperial College London, where he is now Research Associate. In 2001 and 2002 he was awarded the Drapers' Company Undergraduate Prize for outstanding academic achievement. His research interests span various topics in single and multichannel speech and audio processing including dereverberation, blind system identification, acoustic system equalization and speech enhancement. He is a member of the IEEE.


Speech Dereverberation gathers together an overview, a mathematical formulation of the problem and the state-of-the-art solutions for dereverberation.Speech Dereverberation presents current approaches to the problem of reverberation. It provides a review of topics in room acoustics and also describes performance measures for dereverberation. The algorithms are then explained with mathematical analysis and examples that enable the reader to see the strengths and weaknesses of the various techniques, as well as giving an understanding of the questions still to be addressed. Techniques rooted in speech enhancement are included, in addition to a treatment of multichannel blind acoustic system identification and inversion. The TRINICON framework is shown in the context of dereverberation to be a generalization of the signal processing for a range of analysis and enhancement techniques.Speech Dereverberation is suitable for students at masters and doctoral level, as well as established researchers.

Patrick A. Naylor has a PhD in Speech Signal Processing from Imperial College London, where he is currently Reader and Director of Postgraduate Studies for the Department of Electrical and Electronic Engineering. His research interests include speech and audio signal processing; adaptive signal processing; speech enhancement in telecommunications; hands-free functionality; blind SIMO/MIMO channel estimation and dereverberation; speaker identification and verification; and speech production modelling. He is on the IEEE Technical Committee on Audio and Electroacoustics and is Associate Editor of the IEEE Transactions on Audio Speech and Language Processing.Nikolay D. Gaubitch has a PhD in Acoustic Signal Processing from Imperial College London, where he is now Research Associate. In 2001 and 2002 he was awarded the Drapers' Company Undergraduate Prize for outstanding academic achievement. His research interests span various topics in single and multichannel speech and audio processing including dereverberation, blind system identification, acoustic system equalization and speech enhancement. He is a member of the IEEE.

Preface 6
Contents 8
List of Contributors 15
1 Introduction 17
1.1 Background 17
1.2 Effects of Reverberation 18
1.3 Speech Acquisition 19
1.4 System Description 20
1.5 Acoustic Impulse Responses 22
1.6 Literature Overview 24
1.6.1 Beamforming Using Microphone Arrays 24
1.6.2 Speech Enhancement Approaches to Dereverberation 26
1.6.3 Blind System Identification and Inversion 27
1.6.3.1 Blind System Identification 28
1.6.3.2 Inverse Filtering 29
1.7 Outline of the Book 30
References 31
2 Models, Measurement and Evaluation 37
2.1 An Overview of Room Acoustics 37
2.1.1 The Wave Equation 38
2.1.2 Sound Field in a Reverberant Room 39
2.1.3 Reverberation Time 40
2.1.4 The Critical Distance 42
2.1.5 Analysis of Room Acoustics Dependent on Frequency Range 43
2.2 Models of Room Reverberation 45
2.2.1 Intuitive Model 46
2.2.2 Finite Element Models 46
2.2.3 Digital Waveguide Mesh 46
2.2.4 Ray-tracing 47
2.2.5 Source-image Model 47
2.2.6 Statistical Room Acoustics 49
2.3 Subjective Evaluation 51
2.4 Channel-based Objective Measures 52
2.4.1 Normalized Projection Misalignment 53
2.4.2 Direct-to-reverberant Ratio 54
2.4.3 Early-to-total Sound Energy Ratio 54
2.4.4 Early-to-late Reverberation Ratio 55
2.5 Signal-based Objective Measures 55
2.5.1 Log Spectral Distortion 56
2.5.2 Bark Spectral Distortion 56
2.5.3 Reverberation Decay Tail 57
2.5.4 Signal-to-reverberant Ratio 59
2.5.4.1 Relationship Between DRR and SRR 59
2.5.4.2 Level Normalization in SRR 60
2.5.4.3 SRR Computation Example 62
2.5.4.4 SRR Summary 63
2.5.5 Experimental Comparisons 63
2.6 Dereverberation Performance of the Delay-and-sum Beamformer 66
2.6.1 Simulation Results: DSB Performance 67
Experiment 1: Effect of Source-microphone Distance 68
Experiment 2: Effect of Number of Microphones 68
2.7 Summary and Discussion 68
References 70
3 Speech Dereverberation Using Statistical Reverberation Models 73
3.1 Introduction 74
3.2 Review of Dereverberation Methods 76
3.2.1 Reverberation Cancellation 76
3.2.2 Reverberation Suppression 77
3.3 Statistical Reverberation Models 78
3.3.1 Polack’s Statistical Model 78
3.3.2 Generalized Statistical Model 79
3.4 Single-microphone Spectral Enhancement 80
3.4.1 Problem Formulation 81
3.4.2 MMSE Log-spectral Amplitude Estimator 84
3.4.3 a priori SIR Estimator 86
3.5 Multi-microphone Spectral Enhancement 87
3.5.1 Problem Formulation 87
3.5.2 Two Multi-microphone Systems 88
3.5.2.1 MVDR Beamformer and Single-channel MMSE Estimator 88
3.5.2.2 Non-linear Spatial Processor 91
3.5.3 Speech Presence Probability Estimator 91
3.6 Late Reverberant Spectral Variance Estimator 93
3.7 Estimating Model Parameters 97
3.7.1 Reverberation Time 97
3.7.2 Direct-to-reverberant Ratio 98
3.8 Experimental Results 98
3.8.1 Using One Microphone 99
3.8.2 Using Multiple Microphones 102
3.9 Summary and Outlook 104
Acknowledgment 106
References 106
4 Dereverberation Using LPC-based Approaches 111
4.1 Introduction 111
4.2 Linear Predictive Coding of Speech 113
4.3 LPC on Reverberant Speech 115
4.3.1 Effects of Reverberation on the LPC Coefficients 116
4.3.1.1 Single Microphone 116
4.3.1.2 JointMultichannel Optimization 118
4.3.1.3 LPC at the Output of a Delay-and-sum Beamformer 119
4.3.2 Effects of Reverberation on the Prediction Residual 120
4.3.3 Simulation Examples for LPC on Reverberant Speech 121
4.4 Dereverberation Employing LPC 128
4.4.1 Regional Weighting Function 129
4.4.2 Weighting Function Based on Hilbert Envelopes 129
4.4.3 Wavelet Extrema Clustering 129
4.4.4 Weight Function from Coarse Channel Estimates 129
4.4.5 Kurtosis Maximizing Adaptive Filter 130
4.5 Spatiotemporal Averaging Method for Enhancement of Reverberant Speech 131
4.5.1 Larynx Cycle Segmentation with Multichannel DYPSA 132
4.5.2 Time Delay of Arrival Estimation for Spatial Averaging 133
4.5.3 Voiced/Unvoiced/Silence Detection 134
4.5.4 Weighted Inter-cycle Averaging 135
4.5.5 Dereverberation Results 137
4.6 Summary 140
Appendix A 140
References 142
5 Multi-microphone Speech Dereverberation Using Eigen-decomposition 145
5.1 Introduction 145
5.2 Problem Formulation 149
5.3 Preliminaries 151
5.4 AIR Estimation – Algorithm Derivation 154
5.5 Extensions of the Basic Algorithm 156
5.5.1 Two-microphone Noisy Case 156
5.5.1.1 White Noise Case 157
5.5.1.2 Colored Noise Case 157
5.5.2 Multi-microphone Case (M > 2)
5.5.3 Partial Knowledge of the Null Subspace 158
5.6 AIR Estimation in Subbands 159
5.7 Signal Reconstruction 160
5.8 Experimental Study 162
5.8.1 Full-band Version – Results 163
5.8.2 Subband Version – Results 166
5.9 Limitations of the Proposed Algorithms and Possible Remedies 167
5.9.1 Noise Robustness 168
5.9.2 Computational Complexity and Memory Requirements 168
5.9.3 Common Zeros 168
5.9.4 The Demand for the Entire AIR Compensation 169
5.9.5 Filter-bank Design 169
5.9.6 Gain Ambiguity 169
5.10 Summary and Conclusions 170
References 170
6 Adaptive Blind Multichannel System Identification 173
6.1 Introduction 173
6.2 Problem Formulation 176
6.2.1 Channel Identifiability Conditions 177
6.3 Review of Adaptive Algorithms for Acoustic BSI Employing Cross-relations 178
6.3.1 The Multichannel Least Mean Squares Algorithm 178
6.3.2 The Normalized Multichannel Frequency Domain LMS Algorithm 179
6.3.3 The Improved Proportionate NMCFLMS Algorithm 181
6.4 Effect of Noise on the NMCFLMS Algorithm – The Misconvergence Problem 183
6.5 The Constraint Based ext-NMCFLMS Algorithm 185
6.5.1 Effect of Noise on the Cost Function 186
6.5.2 Penalty Term Using the Direct-path Constraint 188
6.5.3 Delay Estimation 190
6.5.4 Flattening Point Estimation 191
6.6 Simulation Results 194
6.6.1 Experimental Setup 195
6.6.2 Variation of Convergence rate on ß 195
6.6.3 Degradation Due to Direct-path Estimation 196
6.6.4 Comparison of Algorithm Performance Using a WGN Input Signal 198
6.6.5 Comparison of Algorithm Performance Using Speech Input Signals 199
6.7 Conclusions 200
References 201
7 Subband Inversion of Multichannel Acoustic Systems 205
7.1 Introduction 205
7.2 Multichannel Equalization 209
7.3 Equalization with Inexact Impulse Responses 210
7.3.1 Effects of System Mismatch 212
7.3.2 Effects of System Length 213
7.4 Subband Multichannel Equalization 214
7.4.1 Oversampled Filter-banks 215
7.4.2 Subband Decomposition 217
7.4.3 Subband Multichannel Equalization 219
7.5 Computational Complexity 220
7.6 Application to Speech Dereverberation 221
7.7 Simulations and Results 223
7.7.1 Experiment 1: Complex Subband Decomposition 223
7.7.2 Experiment 2: Random Channels 225
7.7.3 Experiment 3: Simulated Room Impulse Responses 227
7.7.4 Experiment 4: Speech Dereverberation 229
7.8 Summary 231
References 231
8 Bayesian Single Channel Blind Dereverberation of Speech from a Moving Talker 235
8.1 Introduction and Overview 235
8.1.1 Model-based Framework 236
8.1.1.1 Online vs. Offline Numerical Methods 237
8.1.1.2 Parametric Estimation and Optimal Filtering methods 237
8.1.2 Practical Blind Dereverberation Scenarios 238
8.1.2.1 Single-sensor Applications 238
8.1.2.2 Time-varying Acoustic Channels 238
8.1.3 Chapter Organisation 239
8.2 Mathematical Problem Formulation 239
8.2.1 Bayesian Framework for Blind Dereverberation 241
8.2.2 Classification of Blind Dereverberation Formulations 243
8.2.3 Numerical Bayesian Methods 244
8.2.3.1 Markov Chain Monte Carlo 244
8.2.3.2 Sequential Monte Carlo 246
8.2.3.3 General Comments 246
8.2.4 Identifiability 247
8.3 Nature of Room Acoustics 249
8.3.1 Regions of the Audible Spectrum 250
8.3.2 The Room Transfer Function 251
8.3.3 Issues with Modelling Room Transfer Functions 252
Long and Non-minimum Phase AIRs 252
Robustness to Estimation Error and Variation of Inverse of the AIR 252
Subband and Frequency-zooming Solu 252
8.4 Parametric Channel Models 253
8.4.1 Pole-zero and All-zero Models 253
8.4.2 The Common-acoustical Pole and Zero Model 254
8.4.3 The All-pole Model 254
8.4.4 Subband All-pole Modelling 255
8.4.5 The Nature of Time-varying All-pole Models 258
8.4.6 Static Modelling of TVAP Parameters 260
8.4.7 Stochastic Modelling of Acoustic Channels 261
8.5 Noise and System Model 262
8.6 Source Model 264
8.6.1 Speech Production 264
8.6.2 Time-varying AR Modelling of Unvoiced Speech 265
8.6.2.1 Statistical Nature of Speech Parameter Variation 266
8.6.3 Static Block-based Modelling of TVAR Parameters 267
8.6.3.1 Basis Function Representation 268
8.6.3.2 Choice of Basis Functions 269
8.6.3.3 Block-based Time-varying Approach 269
8.6.4 Stochastic Modelling of TVAR Parameters 270
8.7 Bayesian Blind Dereverberation Algorithms 272
8.7.1 Offline Processing Using MCMC 272
8.7.1.1 Likelihood for Source Signal 272
8.7.1.2 Complete Likelihood for Observations 273
8.7.1.3 Prior Distributions of Source, Channel and Error Residual 273
8.7.1.4 Posterior Distribution of the Channel Parameters 274
8.7.1.5 Experimental Results 275
8.7.2 Online Processing Using Sequential Monte Carlo 277
8.7.2.1 Source and Channel Model 277
8.7.2.2 Conditionally Gaussian State Space 278
8.7.2.3 Methodology 279
8.7.2.4 Channel Estimation Using Bayesian Channel Updates 280
8.7.2.5 Experimental Results 281
8.7.3 Comparison of Offline and Online Approaches 283
8.8 Conclusions 284
References 284
9 Inverse Filtering for Speech Dereverberation Without the Use of Room Acoustics Information 287
9.1 Introduction 287
9.2 Inverse Filtering for Speech Dereverberation 288
9.2.1 Speech Capture Model with Multiple Microphones 289
9.2.2 Optimal Inverse Filtering 290
9.2.3 Unsupervised Algorithm to Approximate Optimal Processing 293
9.3 Approaches to Solving the Over-whitening of the Recovered Speech 296
9.3.1 Precise Compensation for Over-whitening of Target Speech 296
9.3.1.1 Principle 296
9.3.1.2 Close to Perfect Dereverberation 298
9.3.1.3 Dereverberation and Coherent Noise Reduction 299
9.3.1.4 Sensitivity to Incoherent N 303
9.3.2 Late Reflection Removal with Multichannel Multistep LP 304
9.3.2.1 Principle 305
9.3.2.2 Speech Dereverberation Performance in Terms of ASR Score 307
9.3.2.3 Speech Dereverberation in a Noisy Environment 309
9.3.2.4 Dereverberation of Multiple Sound Source Signals 311
9.3.3 Joint Estimation of Linear Predictors and Short-time Speech Characteristics 312
9.3.3.1 Background 312
9.3.3.2 Principle 313
9.3.3.3 Algorithms 316
9.3.4 Probabilistic Model Based Speech Dereverberation 318
9.3.4.1 Probabilistic Speech Model 319
9.3.4.2 Likelihood Function for Multichannel LP 320
9.3.4.3 Autocorrelation Codebook-based Speech Dereverberation 322
9.4 Concluding Remarks 324
Appendix A 324
References 325
10 TRINICON for Dereverberation of Speech and Audio Signals 327
10.1 Introduction 327
10.1.1 Generic Tasks for Blind Adaptive MIMO Filtering 328
10.1.2 A Compact Matrix Formulation for MIMO Filtering Problems 331
10.1.3 Overview of this Chapter 333
10.2 Ideal Inversion Solution and the Direct-inverse Approach to Blind Deconvolution 334
10.3 Ideal Solution of Direct Adaptive Filtering Problems and the Identification-and-inversion Approach to Blind Deconvolution 336
10.3.1 Ideal Separation Solution for Two Sources and Two Sensors 338
10.3.2 Relation to MIMO and SIMO System Identification 340
10.3.3 Ideal Separation Solution and Optimum Separation Filter Length for an Arbitrary Number of Sources and Sensors 341
10.3.4 General Scheme for Blind System Identification 343
10.3.5 Application of Blind System Identification to Blind Deconvolution 344
10.4 TRINICON – A General Framework for Adaptive MIMO Signal Processing and Application to Blind Adaptation Problems 346
10.4.1 Matrix Notation for Convolutive Mixtures 347
10.4.2 Optimization Criterion 348
10.4.3 Gradient-based Coefficient Update 350
10.4.3.1 Alternative Formulation of the Gradient-based Coefficient Update 353
10.4.4 Natural Gradient-based Coefficient Update 354
10.4.5 Incorporation of Stochastic Source Models 354
10.4.5.1 Spherically Invariant Random Processes as Signal Model 356
10.4.5.2 Multivariate Gaussians as Signal Model: Second-order Statistics 357
10.4.5.3 Nearly Gaussian Densities as Signal Model 357
10.5 Application of TRINICON to Blind System Identification and the Identification-and-inversion Approach to Blind Deconvolution 361
10.5.1 Generic Gradient-based Algorithm for Direct Adaptive Filtering Problems 361
10.5.1.1 Illustration for Second-order Statistics 362
10.5.2 Realizations for the SIMO Case 363
10.5.2.1 Coefficient Initialization 366
10.5.2.2 Efficient Implementation of the Sylvester Constraint for the Special Case of SIMO Models 367
10.5.3 Efficient Frequency-domain Realizations for the MIMO Case 369
10.6 Application of TRINICON to the Direct-inverse Approach to Blind Deconvolution 372
10.6.1 Multichannel Blind Deconvolution 373
10.6.2 Multichannel Blind Partial Deconvolution 375
10.6.3 Special Cases and Links to Known Algoritms 378
10.6.3.1 SIMO vs. MIMO Mixing Systems 379
10.6.3.2 Efficient Implementation Using the CorrelationMethod 379
10.6.3.3 Relations to Some Known HOS Approaches 380
10.6.3.4 Relations to Some Known SOS Approaches 381
10.7 Experiments 383
10.7.1 The SIMO Case 384
10.7.2 The MIMO Case 389
10.8 Conclusions 390
Appendix A: Compact Derivation of the Gradient-based Coefficient Update 390
Appendix B: Transformation of the Multivariate Output Signal PDF in (10.39) by Blockwise Sylvester Matrix 392
Appendix C: Polynomial Expansions for Nearly Gaussian Probability Densities 394
Appendix D: Expansion of the Sylvester Constraints in (10.83) 396
References 397
Index 403

Erscheint lt. Verlag 27.7.2010
Reihe/Serie Signals and Communication Technology
Signals and Communication Technology
Zusatzinfo XVIII, 388 p. 126 illus.
Verlagsort London
Sprache englisch
Themenwelt Informatik Theorie / Studium Algorithmen
Informatik Theorie / Studium Künstliche Intelligenz / Robotik
Technik Elektrotechnik / Energietechnik
Technik Nachrichtentechnik
Schlagworte acoustics • algorithms • CP4748 • Dereverberation • Room Acoustics • Speech Enhancement • Speech processing • Speech Recognition
ISBN-10 1-84996-056-9 / 1849960569
ISBN-13 978-1-84996-056-4 / 9781849960564
Haben Sie eine Frage zum Produkt?
PDFPDF (Wasserzeichen)
Größe: 10,3 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasser­zeichen und ist damit für Sie persona­lisiert. Bei einer missbräuch­lichen Weiter­gabe des eBooks an Dritte ist eine Rück­ver­folgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seiten­layout eignet sich die PDF besonders für Fach­bücher mit Spalten, Tabellen und Abbild­ungen. Eine PDF kann auf fast allen Geräten ange­zeigt werden, ist aber für kleine Displays (Smart­phone, eReader) nur einge­schränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich
Build memory-efficient cross-platform applications using .NET Core

von Trevoir Williams

eBook Download (2024)
Packt Publishing (Verlag)
29,99
Learn asynchronous programming by building working examples of …

von Carl Fredrik Samson

eBook Download (2024)
Packt Publishing Limited (Verlag)
29,99