Digital Speech Transmission
John Wiley & Sons Inc (Verlag)
978-0-471-56018-0 (ISBN)
- Titel erscheint in neuer Auflage
- Artikel merken
The enormous advances in digital signal processing (DSP) technology have contributed to the wide dissemination and success of speech communication devices - be it GSM and UMTS mobile telephones, digital hearing aids, or human-machine interfaces. Digital speech transmission techniques play an important role in these applications, all the more because high quality speech transmission remains essential in all current and next generation communication networks.
Enhancement, coding and error concealment techniques improve the transmitted speech signal at all stages of the transmission chain, from the acoustic front-end to the sound reproduction at the receiver. Advanced speech processing algorithms help to mitigate a number of physical and technological limitations such as background noise, bandwidth restrictions, shortage of radio frequencies, and transmission errors.
Digital Speech Transmission provides a single-source, comprehensive guide to the fundamental issues, algorithms, standards, and trends in speech signal processing and speech communication technology. The authors give a solid, accessible overview of
* fundamentals of speech signal processing
* speech coding, including new speech coders for GSM and UMTS
* error concealment by soft decoding
* artificial bandwidth extension of speech signals
* single and multi-channel noise reduction
* acoustic echo cancellation
This text is an invaluable resource for engineers, researchers, academics, and graduate students in the areas of communications, electrical engineering, and information technology.
Peter Vary is the author of Digital Speech Transmission: Enhancement, Coding and Error Concealment, published by Wiley. Rainer Martin is the author of Digital Speech Transmission: Enhancement, Coding and Error Concealment, published by Wiley.
Preface xv
1 Introduction 1
2 Models of Speech Production and Hearing 5
2.1 Organs of Speech Production 6
2.2 Characteristics of Speech Signals 8
2.3 Model of Speech Production 10
2.3.1 Acoustic Tube Model of the Vocal Tract 11
2.3.2 Digital All-Pole Model of the Vocal Tract 19
2.4 Anatomy of Hearing 25
2.5 Psychoacoustic Properties of the Auditory Organ 28
2.5.1 Hearing and Loudness 28
2.5.2 Spectral Resolution 30
2.5.3 Masking 32
Bibliography 33
3 Spectral Transformations 35
3.1 Fourier Transform of Continuous Signals 35
3.2 Fourier Transform of Discrete Signals 37
3.3 Linear Shift Invariant Systems 39
3.3.1 Frequency Response of LSI Systems 41
3.4 The z -transform 41
3.4.1 Relation to FT 43
3.4.2 Properties of the ROC 44
3.4.3 Inverse z -transform 44
3.4.4 z -transform Analysis of LSI Systems 46
3.5 The Discrete Fourier Transform 47
3.5.1 Linear and Cyclic Convolution 50
3.5.2 The DFT of Windowed Sequences 52
3.5.3 Spectral Resolution and Zero Padding 55
3.5.4 Fast Computation of the DFT: The FFT 56
3.5.5 Radix-2 Decimation-in-Time FFT 57
3.6 Fast Convolution 61
3.6.1 Fast Convolution of Long Sequences 61
3.6.2 Fast Convolution by Overlap-Add 61
3.6.3 Fast Convolution by Overlap-Save 62
3.7 Cepstral Analysis 65
3.7.1 Complex Cepstrum 65
3.7.2 Real Cepstrum 66
3.7.3 Applications of the Cepstrum 67
Bibliography 70
4 Filter Banks for Spectral Analysis and Synthesis 73
4.1 Spectral Analysis Using Narrowband Filters 73
4.1.1 Short-Term Spectral Analyzer 78
4.1.2 Prototype Filter Design for the Analysis Filter Bank 82
4.1.3 Short-Term Spectral Synthesizer 84
4.1.4 Short-Term Spectral Analysis and Synthesis 86
4.1.5 Prototype Filter Design for the Analysis–Synthesis Filter Bank 88
4.1.6 Filter Bank Interpretation of the DFT 90
4.2 Polyphase Network Filter Banks 93
4.2.1 PPN Analysis Filter Bank 93
4.2.2 PPN Synthesis Filter Bank 101
4.3 Quadrature Mirror Filter Banks 105
4.3.1 Analysis–Synthesis Filter Bank 105
4.3.2 Compensation of Aliasing and Signal Reconstruction 107
4.3.3 Efficient Implementation 111
Bibliography 115
5 Stochastic Signals and Estimation 119
5.1 Basic Concepts 119
5.1.1 Random Events and Probability 119
5.1.2 Conditional Probabilities 121
5.1.3 Random Variables 121
5.1.4 Probability Distributions and Probability Density Functions 122
5.1.5 Conditional PDFs 123
5.2 Expectations and Moments 124
5.2.1 Conditional Expectations and Moments 125
5.2.2 Examples 125
5.2.3 Transformation of a Random Variable 128
5.2.4 Relative Frequencies and Histograms 129
5.3 Bivariate Statistics 130
5.3.1 Marginal Densities 130
5.3.2 Expectations and Moments 130
5.3.3 Uncorrelatedness and Statistical Independence 131
5.3.4 Examples of Bivariate PDFs 132
5.3.5 Functions of Two Random Variables 133
5.4 Probability and Information 135
5.4.1 Entropy 135
5.4.2 Kullback–Leibler Divergence 135
5.4.3 Mutual Information 136
5.5 Multivariate Statistics 136
5.5.1 Multivariate Gaussian Distribution 137
5.5.2 χ2 -distribution 137
5.6 Stochastic Processes 138
5.6.1 Stationary Processes 138
5.6.2 Auto-correlation and Auto-covariance Functions 139
5.6.3 Cross-correlation and Cross-covariance Functions 140
5.6.4 Multivariate Stochastic Processes 140
5.7 Estimation of Statistical Quantities by Time Averages 142
5.7.1 Ergodic Processes 142
5.7.2 Short-Time Stationary Processes 143
5.8 Power Spectral Densities 144
5.8.1 White Noise 145
5.9 Estimation of the Power Spectral Density 145
5.9.1 The Periodogram 145
5.9.2 Smoothed Periodograms 147
5.10 Statistical Properties of Speech Signals 147
5.11 Statistical Properties of DFT Coefficients 148
5.11.1 Asymptotic Statistical Properties 149
5.11.2 Signal-plus-Noise Model 150
5.11.3 Statistical Properties of DFT Coefficients for Finite Frame Lengths 152
5.12 Optimal Estimation 154
5.12.1 MMSE Estimation 155
5.12.2 Optimal Linear Estimator 156
5.12.3 The Gaussian Case 157
5.12.4 Joint Detection and Estimation 158
Bibliography 160
6 Linear Prediction 163
6.1 Vocal Tract Models and Short-Term Prediction 164
6.2 Optimal Prediction Coefficients for Stationary Signals 171
6.2.1 Optimum Prediction 171
6.2.2 Spectral Flatness Measure 174
6.3 Predictor Adaptation 177
6.3.1 Block-Oriented Adaptation 177
6.3.2 Sequential Adaptation 188
6.4 Long-Term Prediction 192
Bibliography 198
7 Quantization 201
7.1 Analog Samples and Digital Representation 201
7.2 Uniform Quantization 203
7.3 Non-uniform Quantization 211
7.4 Optimal Quantization 221
7.5 Adaptive Quantization 222
7.6 Vector Quantization 228
7.6.1 Principle 228
7.6.2 The Complexity Problem 230
7.6.3 Lattice Quantization 231
7.6.4 Design of Optimal Vector Code Books 232
7.6.5 Gain–Shape Vector Quantization 236
Bibliography 237
8 Speech Coding 239
8.1 Classification of Speech Coding Algorithms 240
8.2 Model-Based Predictive Coding 243
8.3 Differential Waveform Coding 245
8.3.1 First-Order DPCM 245
8.3.2 Open-Loop and Closed-Loop Prediction 249
8.3.3 Quantization of the Residual Signal 250
8.3.4 Adaptive Differential Pulse Code Modulation 260
8.4 Parametric Coding 262
8.4.1 Vocoder Structures 262
8.4.2 LPC Vocoder 265
8.4.3 Quantization of the Predictor Coefficients 266
8.5 Hybrid Coding 273
8.5.1 Basic Codec Concepts 273
8.5.2 Residual Signal Coding: RELP 282
8.5.3 Analysis by Synthesis: CELP 290
8.5.4 Analysis by Synthesis: MPE, RPE 301
8.6 Adaptive Postfiltering 305
Bibliography 309
9 Error Concealment and Soft Decision Source Decoding 315
9.1 Hard Decision Source Decoding 316
9.2 Conventional Error Concealment 317
9.3 Softbits and L-values 321
9.3.1 Binary Symmetric Channel (BSC) 321
9.3.2 Fading–AWGN Channel 329
9.3.3 Channel with Inner SISO Decoding 335
9.4 Soft Decision (SD) Source Decoding 336
9.4.1 Parameter Estimation 338
9.4.2 The A Posteriori Probabilities 340
9.5 Application to Model Parameters 345
9.5.1 Soft Decision Decoding without Channel Coding 346
9.5.2 Soft Decision Decoding with Channel Coding 348
9.6 Further Improvements 353
Bibliography 355
10 Bandwidth Extension (BWE) of Speech Signals 361
10.1 Narrowband versus Wideband Telephony 362
10.2 Speech Coding with Integrated BWE 366
10.3 BWE without Auxiliary Transmission 369
10.3.1 Basic Approaches and Classification 369
10.3.2 Spectral Envelope Estimation 372
10.3.3 Extension of the Excitation Signal 375
10.3.4 Example BWE Algorithm 377
Bibliography 382
11 Single and Dual Channel Noise Reduction 389
11.1 Introduction 390
11.2 Linear MMSE Estimators 392
11.2.1 Non-causal IIR Wiener filter 392
11.2.2 The FIR Wiener Filter 395
11.3 Speech Enhancement in the DFT Domain 396
11.3.1 The Wiener Filter Revisited 398
11.3.2 Spectral Subtraction 400
11.3.3 Estimation of the APrioriSNR 402
11.3.4 Musical Noise and Countermeasures 403
11.3.5 Aspects of Spectral Analysis/Synthesis 408
11.4 Optimal Non-linear Estimators 411
11.4.1 Maximum Likelihood Estimation 412
11.4.2 Maximum A Posteriori Estimation 414
11.4.3 MMSE Estimation 414
11.4.4 MMSE Estimation of Functions of the Spectral Magnitude 416
11.5 Joint Optimum Detection and Estimation of Speech 419
11.6 Computation of Likelihood Ratios 422
11.7 Estimation of the APrioriProbability of Speech Presence 423
11.7.1 A Hard-Decision Estimator Based on Conditional Probabilities 423
11.7.2 Soft-Decision Estimation 424
11.7.3 Estimation Based on the A Posteriori SNR 424
11.8 VAD and Noise Estimation Techniques 425
11.8.1 Voice Activity Detection 426
11.8.2 Noise Estimation Using a Soft-Decision Detector 432
11.8.3 Noise Power Estimation Based on Minimum Statistics 434
11.9 Dual Channel Systems 443
11.9.1 Noise Cancellation 449
11.9.2 Noise Reduction 452
11.9.3 Implementations of Dual Channel Noise Reduction Systems 453
11.9.4 Combined Single and Dual Channel Noise Reduction 454
Bibliography 456
12 Multi-channel Noise Reduction 467
12.1 Introduction 467
12.2 Sound Waves 468
12.3 Spatial Sampling of Sound Fields 470
12.3.1 The Farfield Model 472
12.3.2 The Uniform Linear Array 474
12.3.3 Phase Ambiguity and Coherence 475
12.3.4 Spatial Correlation Properties of Acoustic Signals 476
12.4 Beamforming 477
12.4.1 Delay-and-Sum Beamforming 477
12.4.2 Filter-and-Sum Beamforming 478
12.5 Performance Measures and Spatial Aliasing 481
12.5.1 Array Gain and Array Sensitivity 481
12.5.2 Directivity Pattern 482
12.5.3 Directivity and Directivity Index 484
12.5.4 Example: Differential Microphones 485
12.6 Design of Fixed Beamformers 488
12.6.1 Minimum Variance Distortionless Response Beamformer 488
12.6.2 MVDR Beamformer with Limited Susceptibility 491
12.7 Multi-channel Wiener Filter and Postfilter 493
12.8 Adaptive Beamformers 495
12.8.1 The Frost Beamformer 495
12.8.2 Generalized Side-Lobe Canceller 498
12.8.3 Generalized Side-lobe Canceller with Adaptive Blocking Matrix 500
12.9 Optimal Non-linear Multi-channel Noise Reduction 501
Bibliography 501
13 Acoustic Echo Control 505
13.1 The Echo Control Problem 505
13.2 Evaluation Criteria 511
13.3 The Wiener Solution 513
13.4 The LMS and NLMS Algorithms 514
13.4.1 Derivation and Basic Properties 514
13.5 Convergence Analysis and Control of the LMS Algorithm 516
13.5.1 Convergence in the Absence of Interference 517
13.5.2 Convergence in the Presence of Interference 520
13.5.3 Filter Order of the Echo Canceller 523
13.5.4 Stepsize Parameter 524
13.6 Geometric Projection Interpretation of the NLMS Algorithm 527
13.7 The Affine Projection Algorithm 529
13.8 Least-Squares and Recursive Least-Squares Algorithms 531
13.8.1 The Weighted Least-Squares Algorithm 532
13.8.2 The RLS Algorithm 533
13.9 Block Processing and Frequency Domain Adaptive Filters 536
13.9.1 Block LMS Algorithm 537
13.9.2 The Exact Block NLMS Algorithm 537
13.9.3 Frequency Domain Adaptive Filter (FDAF) 539
13.9.4 Subband Acoustic Echo Cancellation 549
13.10 Additional Measures for Echo Control 550
13.10.1 Echo Canceller with Center Clipper 550
13.10.2 Echo Canceller with Voice-Controlled Switching 551
13.10.3 Echo Canceller with Adaptive Postfilter in the Time Domain 553
13.10.4 Echo Canceller with Adaptive Postfilter in the Frequency Domain 554
13.10.5 Initialization with Perfect Sequences 555
13.11 Stereophonic Acoustic Echo Control 557
13.11.1 The Non-uniqueness Problem 559
13.11.2 Solutions to the Non-uniqueness Problem 559
Bibliography 561
Appendix A Codec Standards 569
A.1 Evaluation Criteria 570
A.2 ITU-T/G.726: Adaptive Differential Pulse Code Modulation (ADPCM) 572
A.3 ITU-T/G.728: Low-Delay CELP Speech Coder 573
A.4 ITU-T/G.729: Conjugate-Structure Algebraic CELP Codec 576
A.5 ITU-T/G.722: 7 kHz Audio Coding within 64 kbit/s 579
A.6 ETSI-GSM 06.10: Full Rate Speech Transcoding 580
A.7 ETSI-GSM 06.20: Half Rate Speech Transcoding 582
A.8 ETSI-GSM 06.60: Enhanced Full Rate Speech Transcoding 584
A.9 ETSI-GSM 06.90: Adaptive Multi-Rate (AMR) Speech Transcoding 586
A.10 ETSI/3GPP AMR Wideband Speech Transcoding 590
A.11 ETSI/3GPP Extended AMR Wideband Codec, AMR-WB+ 592
A.12 TIA IS-96: Speech Service Option Standard for Wideband Spread-Spectrum Systems 594
A.13 INMARSAT: Improved Multi-Band Excitation Codec (IMBE) 595
Appendix B Speech Quality Assessment 597
B.1 Auditive Speech Quality Measures 597
B.2 Instrumental Speech Quality Measures 602
Bibliography 604
Index 607
Erscheint lt. Verlag | 3.3.2006 |
---|---|
Verlagsort | New York |
Sprache | englisch |
Maße | 172 x 254 mm |
Gewicht | 1219 g |
Themenwelt | Mathematik / Informatik ► Informatik |
Technik ► Elektrotechnik / Energietechnik | |
ISBN-10 | 0-471-56018-9 / 0471560189 |
ISBN-13 | 978-0-471-56018-0 / 9780471560180 |
Zustand | Neuware |
Haben Sie eine Frage zum Produkt? |
aus dem Bereich