Audio Source Separation (eBook)

Shoji Makino (Herausgeber)

eBook Download: PDF
2018 | 1. Auflage
VIII, 389 Seiten
Springer-Verlag
978-3-319-73031-8 (ISBN)

Lese- und Medienproben

Audio Source Separation -
Systemvoraussetzungen
160,49 inkl. MwSt
  • Download sofort lieferbar
  • Zahlungsarten anzeigen

This book provides the first comprehensive overview of the fascinating topic of audio source separation based on non-negative matrix factorization, deep neural networks, and sparse component analysis.

The first section of the book covers single channel source separation based on non-negative matrix factorization (NMF). After an introduction to the technique, two further chapters describe separation of known sources using non-negative spectrogram factorization, and temporal NMF models. In section two, NMF methods are extended to multi-channel source separation. Section three introduces deep neural network (DNN) techniques, with chapters on multichannel and single channel separation, and a further chapter on DNN based mask estimation for monaural speech separation. In section four, sparse component analysis (SCA) is discussed, with chapters on source separation using audio directional statistics modelling, multi-microphone MMSE-based techniques and diffusion map methods.

The book brings together leading researchers to provide tutorial-like and in-depth treatments on major audio source separation topics, with the objective of becoming the definitive source for a comprehensive, authoritative, and accessible treatment. This book is written for graduate students and researchers who are interested in audio source separation techniques based on NMF, DNN and SCA.



SHOJI MAKINO (F) received the B. E., M. E., and Ph.D. degrees from Tohoku University, Japan, in 1979, 1981, and 1993, respectively. He joined NTT in 1981. He is now a Professor at University of Tsukuba. His research interests include adaptive filtering technologies, realization of acoustic echo cancellation, blind source separation of convolutive mixtures of speech, and acoustic signal processing for speech and audio applications.

Dr. Makino received the IEEE SPS Best Paper Award in 2014, the IEEE MLSP Competition Award in 2007, the ICA Unsupervised Learning Pioneer Award in 2006, the Commendation for Science and Technology of Japanese Government in 2015, the TELECOM System Technology Award in 2015 and 2004, the Achievement Award of the Institute of Electronics, Information, and Communication Engineers (IEICE) in 1997, and the Outstanding Technological Development Award of the Acoustical Society of Japan (ASJ) in 1995, the Paper Award of the IEICE in 2005 and 2002, the Paper Award of the ASJ in 2005 and 2002. He is the author or co-author of more than 200 articles in journals and conference proceedings and is responsible for more than 150 patents. He was a Keynote Speaker at ICA2007 and a Tutorial speaker at EMBC 2013, Interspeech 2011 and ICASSP 2007.

Dr. Makino IEEE activities include: Member, SPS Technical Directions Board (2013-14), SPS Awards Board (2006-08), SPS Conference Board (2002-04), IEEE Jack S. Kilby Signal Processing Medal Committee (2015-), IEEE James L. Flanagan Speech & Audio Processing Award Committee (2008-11) and  Member and Chair, SPS Audio and Electroacoustics Technical Committee (1993-09 and 2013-14, respectively); SPS Distinguished Lecturer (2009-10); Chair, Circuits and Systems Society Blind Signal Processing Technical Committee (2009-2010); Associate Editor, IEEE Transactions on Speech and Audio Processing (2002-05) and EURASIP Journal on Advances in Signal Processing (2005-2012). He was the Vice President, Engineering Sciences Society of the IEICE (2007-08) and Chair, Engineering Acoustics Technical Committee of the IEICE (2006-08). He is a Member, International IWAENC Standing committee and International ICA Steering Committee; General Chair, WASPAA2007 and IWAENC2003; Organizing Chair, ICA2003; and Plenary Chair, ICASSP2012.

Dr. Makino is an IEEE Fellow, an IEICE Fellow, a Board member of the ASJ, and a member of EURASIP and ISCA.

SHOJI MAKINO (F) received the B. E., M. E., and Ph.D. degrees from Tohoku University, Japan, in 1979, 1981, and 1993, respectively. He joined NTT in 1981. He is now a Professor at University of Tsukuba. His research interests include adaptive filtering technologies, realization of acoustic echo cancellation, blind source separation of convolutive mixtures of speech, and acoustic signal processing for speech and audio applications. Dr. Makino received the IEEE SPS Best Paper Award in 2014, the IEEE MLSP Competition Award in 2007, the ICA Unsupervised Learning Pioneer Award in 2006, the Commendation for Science and Technology of Japanese Government in 2015, the TELECOM System Technology Award in 2015 and 2004, the Achievement Award of the Institute of Electronics, Information, and Communication Engineers (IEICE) in 1997, and the Outstanding Technological Development Award of the Acoustical Society of Japan (ASJ) in 1995, the Paper Award of the IEICE in 2005 and 2002, the Paper Award of the ASJ in 2005 and 2002. He is the author or co-author of more than 200 articles in journals and conference proceedings and is responsible for more than 150 patents. He was a Keynote Speaker at ICA2007 and a Tutorial speaker at EMBC 2013, Interspeech 2011 and ICASSP 2007. Dr. Makino IEEE activities include: Member, SPS Technical Directions Board (2013-14), SPS Awards Board (2006-08), SPS Conference Board (2002-04), IEEE Jack S. Kilby Signal Processing Medal Committee (2015-), IEEE James L. Flanagan Speech & Audio Processing Award Committee (2008-11) and  Member and Chair, SPS Audio and Electroacoustics Technical Committee (1993-09 and 2013-14, respectively); SPS Distinguished Lecturer (2009-10); Chair, Circuits and Systems Society Blind Signal Processing Technical Committee (2009-2010); Associate Editor, IEEE Transactions on Speech and Audio Processing (2002-05) and EURASIP Journal on Advances in Signal Processing (2005-2012). He was the Vice President, Engineering Sciences Society of the IEICE (2007-08) and Chair, Engineering Acoustics Technical Committee of the IEICE (2006-08). He is a Member, International IWAENC Standing committee and International ICA Steering Committee; General Chair, WASPAA2007 and IWAENC2003; Organizing Chair, ICA2003; and Plenary Chair, ICASSP2012. Dr. Makino is an IEEE Fellow, an IEICE Fellow, a Board member of the ASJ, and a member of EURASIP and ISCA.

Preface 6
Contents 8
1 Single-Channel Audio Source Separation with NMF: Divergences, Constraints and Algorithms 10
1.1 Introduction 10
1.2 Signal Decomposition by NMF 12
1.2.1 NMF by Optimisation 13
1.2.2 Composite Models 15
1.2.3 Majorisation-Minimisation 17
1.3 Advanced Decompositions for Source Separation 20
1.3.1 Pre-specified Dictionaries 20
1.3.2 Penalised NMF 25
1.3.3 User-guided NMF 27
1.4 Conclusions 29
References 30
2 Separation of Known Sources Using Non-negative Spectrogram Factorisation 34
2.1 Introduction 34
2.2 NMF Model for Separation of Known Sounds 35
2.2.1 Estimation Criteria and Algorithms 37
2.3 Sound Dictionary Learning and Adaptation 39
2.3.1 Generative Dictionaries 40
2.3.2 Discriminative Dictionaries 45
2.3.3 Dictionary Adaptation 45
2.4 Semi-supervised Separation 47
2.5 Low-Latency Separation 49
2.5.1 Algorithmic and Processing Latency 50
2.5.2 Use of Coupled Dictionaries for Very Low Latency Separation 51
2.5.3 Factorisation 54
2.6 Conclusions and Discussion 55
References 56
3 Dynamic Non-negative Models for Audio Source Separation 58
3.1 Introduction 58
3.2 The PLCA Models 59
3.3 Convolutional Models 60
3.4 Non-negative Hidden Markov Models 64
3.4.1 Single Source Models 65
3.4.2 Source Separation 67
3.4.3 Illustrative Examples 69
3.5 Dynamic PLCA Using Continuous State-Space Representation 72
3.5.1 Model Definitions 72
3.5.2 Estimation Methods 73
3.5.3 Illustrative Examples 75
3.6 Conclusions 78
References 79
4 An Introduction to Multichannel NMF for Audio Source Separation 81
4.1 Introduction 81
4.2 Local Gaussian Model 83
4.3 Spectral Models 85
4.3.1 NMF Modeling of Each Source 85
4.3.2 Joint NTF Modeling of All Sources 87
4.4 Spatial Models and Constraints 89
4.5 Main Steps and Sources Estimation 91
4.6 Model Estimation Criteria 92
4.6.1 Maximum Likelihood 92
4.6.2 Maximum a Posteriori 92
4.6.3 Other Criteria 93
4.7 Model Estimation Algorithms 93
4.7.1 Variants of EM Algorithm 94
4.7.2 Detailed Presentation of SSEM/MU Algorithm 96
4.7.3 Other Algorithms 99
4.8 Conclusion 99
References 100
5 General Formulation of Multichannel Extensions of NMF Variants 103
5.1 Introduction 103
5.2 Problem Formulation 105
5.2.1 Mixing Systems 105
5.2.2 Likelihood Function 107
5.3 Spectral and Spatial Models 109
5.3.1 Spectral Models 109
5.3.2 Spatial Models 114
5.4 Parameter Estimation and Signal Separation 116
5.4.1 Parameter Estimation 116
5.4.2 Signal Separation 120
5.5 Categorization of State-of-the-art Approaches 121
5.6 Derivations of MNMF and MFHMM Algorithms 123
5.6.1 MNMF Algorithm 123
5.6.2 MFHMM Algorithm 127
5.6.3 Demixing Filter Estimation Algorithm 128
5.7 Conclusion 129
References 130
6 Determined Blind Source Separation with Independent Low-Rank Matrix Analysis 133
6.1 Introduction 134
6.2 Generative Source Models in IVA and NMF Based on Itakura–Saito Divergence 135
6.2.1 Formulation 135
6.2.2 IVA 136
6.2.3 Time-Varying Gaussian IVA 138
6.2.4 Itakura–Saito NMF 139
6.3 Independent Low-Rank Matrix Analysis: A Unification of IVA and Itakura–Saito NMF 142
6.3.1 Motivation and Strategy 142
6.3.2 Derivation of Cost Function 143
6.3.3 Update Rules 144
6.3.4 Summary of Algorithm 146
6.4 Relationship Between Time-Varying Gaussian IVA, ILRMA, and Multichannel NMF 149
6.4.1 Generative Model in MNMF and Spatial Covariance 149
6.4.2 Existing MNMF Models 150
6.4.3 Equivalence Between ILRMA and MNMF with Rank-1 Spatial Model 150
6.5 Experiments on Speech and Music Separation 153
6.5.1 Datasets 153
6.5.2 Experimental Analysis of Optimal Number of Bases for ILRMA 155
6.5.3 Comparison of Separation Performance 156
6.6 Conclusions 159
References 160
7 Deep Neural Network Based Multichannel Audio Source Separation 164
7.1 Introduction 164
7.2 Background 166
7.2.1 Problem Formulation 166
7.2.2 Multichannel Gaussian Model 167
7.2.3 General Iterative EM Framework 168
7.3 DNN-Based Multichannel Source Separation 170
7.3.1 Algorithm 170
7.3.2 Cost Functions 171
7.3.3 Weighted Spatial Parameter Updates 173
7.4 Experimental Evaluation 173
7.4.1 General System Design 174
7.4.2 Application: Speech Enhancement 177
7.4.3 Application: Music Separation 183
7.5 Closing Remarks 188
References 188
8 Efficient Source Separation Using Bitwise Neural Networks 193
8.1 Introduction 193
8.2 A Basic Neural Network for Source Separation 195
8.3 Binary Features for Audio Signals 198
8.3.1 Winner-Take-All Hashing 198
8.3.2 Semantic Hashing 200
8.3.3 Quantization and Dispersion 201
8.4 BNN Feedforward 202
8.4.1 The Feedforward Procedure 202
8.4.2 Linear Separability 203
8.4.3 Efficiency 203
8.5 BNN Training 205
8.5.1 The First Round: Weight Compressed DNN 205
8.5.2 The Second Round: Noisy Feedforward and Sparsity 206
8.6 Experimental Results 208
8.6.1 The Data Set 208
8.6.2 Pre-processing 208
8.6.3 The Setup for the First Round 209
8.6.4 The Setup for the Second Round 209
8.6.5 Discussion 209
8.7 Conclusion 210
References 211
9 DNN Based Mask Estimation for Supervised Speech Separation 213
9.1 Speech Separation Problem 213
9.2 Classifiers and Learning Machines 215
9.2.1 Multilayer Perceptrons 216
9.2.2 Recurrent Neural Networks 217
9.3 Training Targets 219
9.4 Features 223
9.5 Speech Separation Algorithms 225
9.5.1 Speech-Nonspeech Separation 226
9.5.2 Other Separation/Enhancement Tasks 233
9.6 Conclusion 237
References 238
10 Informed Spatial Filtering Based on Constrained Independent Component Analysis 242
10.1 Introduction 243
10.2 Signal Model 246
10.3 Multichannel Linear Filtering for Signal Extraction 249
10.3.1 Linearly Constrained Minimum Variance Filter 250
10.3.2 The Generalized Sidelobe Canceler 251
10.4 Linearly Constrained Minimum Mutual Information-Based Signal Extraction 255
10.4.1 Generic Optimization Criterion 255
10.4.2 Constrained Natural Gradient-Descent for Iterative Optimization Update Rule 258
10.4.3 Definition of the Set of Constraints 260
10.4.4 Geometrical Interpretation of the Constrained Update Rule 261
10.4.5 Realization as Minimum Mutual Information-Based Generalized Sidelobe Canceler 262
10.4.6 Realization of the Blocking Matrix 264
10.4.7 Estimation of the Set of Constraints 266
10.4.8 Special Source Models 267
10.4.9 Links to Some Generic Linear Signal Extraction Methods Based on Second-Order Statistics 270
10.5 Experiments 271
10.5.1 Experimental Setup 271
10.5.2 Estimation of Relative Impulse Responses 273
10.5.3 Signal Enhancement 276
10.6 Conclusion 277
References 278
11 Recent Advances in Multichannel Source Separation and Denoising Based on Source Sparseness 284
11.1 Introduction 284
11.2 Source Separation and Denoising Based on Observation Vector Clustering 286
11.2.1 Mask Estimation 286
11.2.2 Source Signal Estimation 289
11.3 Mask Estimation Based on Modeling Directional Statistics 291
11.3.1 Mask Estimation Based on Complex Watson Mixture Model (cWMM) 291
11.3.2 Mask Estimation Based on Complex Bingham Mixture Model (cBMM) 294
11.3.3 Mask Estimation Based on Complex Gaussian Mixture Model (cGMM) 296
11.4 Experimental Evaluation 297
11.4.1 Source Separation 297
11.4.2 Denoising 298
11.5 Conclusions 300
References 304
12 Multimicrophone MMSE-Based Speech Source Separation 306
12.1 Introduction 306
12.2 Background 308
12.2.1 Generic Propagation Model 308
12.2.2 Spatial Filtering 309
12.2.3 Second-Order Moments and Criteria 310
12.3 Matched Filter 312
12.3.1 Design 312
12.3.2 Performance 313
12.4 Multichannel Wiener Filter 314
12.4.1 Design 314
12.4.2 Performance 315
12.5 Multichannel LCMV 316
12.5.1 Design 316
12.5.2 Performance 317
12.6 Parameters Estimation 318
12.6.1 Multichannel SPP Estimators 318
12.6.2 Covariance Matrix Estimators 322
12.6.3 Procedures for Semi-blind RTF Estimation 323
12.7 Examples 324
12.7.1 Narrowband Signals at an Anechoic Environment 325
12.7.2 Speech Signals at a Reverberant Environment 331
12.8 Summary 333
References 334
13 Musical-Noise-Free Blind Speech Extraction Based on Higher-Order Statistics Analysis 337
13.1 Introduction 337
13.2 Single-Channel Speech Enhancement with Musical-Noise-Free Properties 339
13.2.1 Conventional Non-iterative Spectral Subtraction 339
13.2.2 Iterative Spectral Subtraction 339
13.2.3 Modeling of Input Signal 340
13.2.4 Metric of Musical Noise Generation: Kurtosis Ratio 341
13.2.5 Musical Noise Generation in Non-iterative Spectral Subtraction 343
13.2.6 Musical-Noise-Free Speech Enhancement 346
13.3 Extension to Multichannel Blind Signal Processing 348
13.3.1 Blind Spatial Subtraction Array 348
13.3.2 Iterative Blind Spatial Subtraction Array 348
13.3.3 Accuracy of Wavefront Estimated by Independent Component Analysis After Spectral Subtraction 350
13.4 Improvement Scheme for Poor Noise Estimation 355
13.4.1 Channel Selection in Independent Component Analysis 355
13.4.2 Time-Variant Noise Power Spectral Density Estimator 355
13.5 Experiments in Real World 356
13.5.1 Experimental Conditions 356
13.5.2 Objective Evaluation 356
13.5.3 Subjective Evaluation 361
13.6 Conclusions and Remarks 363
References 365
14 Audio-Visual Source Separation with Alternating Diffusion Maps 369
14.1 Introduction 369
14.2 Problem Formulation 371
14.3 Separation of the Common Source via Alternating Diffusion Maps 372
14.3.1 Alternating Diffusion Maps 372
14.3.2 Separation of the Common Source 374
14.3.3 Online Extension 375
14.3.4 Source Activity Detection 375
14.4 Experimental Results 376
14.4.1 Experimental Setup 376
14.4.2 Activity Detection of the Common Source 377
14.4.3 Discussion—Sound Source Separation 381
14.5 Conclusions 384
References 385
Index 387

Erscheint lt. Verlag 1.3.2018
Reihe/Serie Signals and Communication Technology
Zusatzinfo VIII, 385 p. 141 illus., 74 illus. in color.
Verlagsort Cham
Sprache englisch
Themenwelt Naturwissenschaften Physik / Astronomie
Technik Elektrotechnik / Energietechnik
Schlagworte audio directional statistics modelling • audio source separation methods • deep neural networks (DNN) for source separation • DNN based mask estimation • monaural speech separation • multi-channel source separation • multi-microphone MMSE-based techniques • non-negative matrix factorization (NMF) • non-negative spectrogram factorization • sparse component analysis (SCA)
ISBN-10 3-319-73031-2 / 3319730312
ISBN-13 978-3-319-73031-8 / 9783319730318
Haben Sie eine Frage zum Produkt?
PDFPDF (Wasserzeichen)
Größe: 15,3 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasser­zeichen und ist damit für Sie persona­lisiert. Bei einer missbräuch­lichen Weiter­gabe des eBooks an Dritte ist eine Rück­ver­folgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seiten­layout eignet sich die PDF besonders für Fach­bücher mit Spalten, Tabellen und Abbild­ungen. Eine PDF kann auf fast allen Geräten ange­zeigt werden, ist aber für kleine Displays (Smart­phone, eReader) nur einge­schränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich

von Horst Kuchling; Thomas Kuchling

eBook Download (2022)
Carl Hanser Verlag GmbH & Co. KG
24,99
Grundlagen - Verfahren - Anwendungen - Beispiele

von Jens Bliedtner

eBook Download (2022)
Carl Hanser Verlag GmbH & Co. KG
49,99