Speech Enhancement - Jacob Benesty, Jingdong Chen, Mads Graesboll Christensen, Jesper Rindom Jensen

Speech Enhancement (eBook)

A Signal Subspace Perspective

Jacob Benesty, Jingdong Chen, Mads Graesboll Christensen, Jesper Rindom Jensen (Autoren)

eBook Download: PDF | EPUB

2014 | 1. Auflage
138 Seiten
Elsevier Science (Verlag)
978-0-12-800253-7 (ISBN)

This book bridges the gap between these two classes of methods by showing how the ideas behind subspace methods can be incorporated into traditional linear filtering. In the context of subspace methods, the enhancement problem can then be seen as a classical linear filter design problem. This means that various solutions can more easily be compared and their performance bounded and assessed in terms of noise reduction and speech distortion. The book shows how various filter designs can be obtained in this framework, including the maximum SNR, Wiener, LCMV, and MVDR filters, and how these can be applied in various contexts, like in single-channel and multichannel speech enhancement, and in both the time and frequency domains.

First short book treating subspace approaches in a unified way for time and frequency domains, single-channel, multichannel, as well as binaural, speech enhancement.
Bridges the gap between optimal filtering methods and subspace approaches.
Includes original presentation of subspace methods from different perspectives.

Speech enhancement is a classical problem in signal processing, yet still largely unsolved. Two of the conventional approaches for solving this problem are linear filtering, like the classical Wiener filter, and subspace methods. These approaches have traditionally been treated as different classes of methods and have been introduced in somewhat different contexts. Linear filtering methods originate in stochastic processes, while subspace methods have largely been based on developments in numerical linear algebra and matrix approximation theory. This book bridges the gap between these two classes of methods by showing how the ideas behind subspace methods can be incorporated into traditional linear filtering. In the context of subspace methods, the enhancement problem can then be seen as a classical linear filter design problem. This means that various solutions can more easily be compared and their performance bounded and assessed in terms of noise reduction and speech distortion. The book shows how various filter designs can be obtained in this framework, including the maximum SNR, Wiener, LCMV, and MVDR filters, and how these can be applied in various contexts, like in single-channel and multichannel speech enhancement, and in both the time and frequency domains. First short book treating subspace approaches in a unified way for time and frequency domains, single-channel, multichannel, as well as binaural, speech enhancement Bridges the gap between optimal filtering methods and subspace approaches Includes original presentation of subspace methods from different perspectives

Chapter 1

Introduction

Abstract

The presence of background noise is problematic for humans and computers alike, and the problem of dealing with it (called speech enhancement or noise reduction) is an important and long-standing problem in signal processing. The search for new and better methods continues today. Speech enhancement algorithms are important components in many systems where speech plays a part, including telephony, hearing aids, voice over IP, and automatic speech recognizers. Speech enhancement is generally concerned with the problem of enhancing the quality of speech signals. This can, of course, mean many things, but it is often associated with the specific problem of reducing the impact of additive noise, which is also what we are concerned with in the present book.

Keywords

Noise reduction; Speech enhancement; Subspace methods; Reduced-rank signal processing

In verbal communication, the presence of background noise, such as the sound of a passing car or an air vent, can impact the quality of the speech signal in a detrimental way, something that affects the listener and thus also the communication in several negative ways. Not only may the perceived quality of the speech be harmed, but also its intelligibility may be degraded. Even if only the perceived quality of the speech is affected, this may have a severe impact on the ability of the users to communicate, as exposure to noisy signals may cause listener fatigue. The presence of noise in signals is, though, not only a problem for humans. In speech processing systems, background noise causes additional problems, as such systems often comprise components that are designed under the assumption that only one, clean speech signal is present at any given time. This is, for example, the case for automatic speech recognizers and speech coders. This is typically done to simplify the design of these components, as the underlying statistical models then do not have to account for all possible noise types. Not only does this simplify the training of such models, it also, generally speaking, leads to faster algorithms; but it also renders these components vulnerable to noise.

As we have argued, the presence of background noise is problematic for humans and computers alike, and the problem of dealing with it, which is called speech enhancement or noise reduction, is an important and long-standing problem in signal processing (see, e.g., [1] and [2] for recent surveys), and the search for new and better methods continues today. Speech enhancement algorithms are important components in many systems, where speech plays a part, including telephony, hearing aids, voice over IP, and automatic speech recognizers. Speech enhancement is generally concerned with the problem of enhancing the quality of speech signals. This can, of course, mean many things, but it is often associated with the specific problem of reducing the impact of additive noise, which is also what we are concerned with in the present book. Additive noise occurs naturally in acoustic environments when multiple sources are present, and examples of common noise types are street, car, and babble. Moreover, it can also be caused by intrinsic noise in the sensor system, i.e., from the electrical components. To be more precise, the purpose of speech enhancement is to minimize the impact of the background noise while preserving the speech signal. Hence, there are two performance measures by which the efficiency of speech enhancement methods is compared: speech distortion and noise reduction [3]. These two measures are often conflicting, meaning that if we want to achieve the highest possible noise reduction, then we must accept speech distortion and, similarly, that if we cannot accept any speech distortion, then our ability to perform noise reduction will be hampered. An extreme example of this is the maximum signal-to-noise ratio filter [1] which achieves the highest possible noise reduction but at the cost of severe speech distortion.

The history of noise reduction can be traced back to the work of Wiener [4], i.e., to the very early days of signal processing. Due to the importance of the problem in particular in speech applications, many different solutions have been proposed over the years, and much time and effort is still devoted to the problem today. The problem is often broken into two sub-problems, namely the problem of finding a function to be applied to the observed signal so as to extract the desired signal, i.e., the speech signal, and the problem of finding the information that this function depends on. If we restrict ourselves to linear filters, then the first sub-problem is the problem of finding the optimal filter, i.e., a filter design problem. If the criterion for optimality is the mean-square error, then the so-called Wiener filter is the solution. This filter requires knowledge of the noise statistics (or the speech statistics), and the second sub-problem is then that of finding those statistics, often in the form of the noise correlation matrix or its power spectral density. In the past decade, most work seems to have focused on the second sub-problem, e.g., [5–9], under difficult conditions when the noise is nonstationary. This book is, however, concerned with the first sub-problem, which is determining the function that should be applied to the observed signal. This problem has, though, also seen some important new contributions regarding optimal filtering in the past few years, including [3,10,11].

In the literature, one can find many (seemingly) different attempts at solving the problem of speech enhancement, and at the time a new method is published, it is often not clear how exactly it relates to other, existing methods, often because it is either not clear exactly what problems are being solved, or that the problems are stated in different ways whose relation is difficult to ascertain. In fact, it appears that the retrospective process of relating methods may take decades, if it ever occurs. When listing existing classes of methods for speech enhancement, spectral subtraction, (optimal) linear filtering, statistical model-based approaches, and subspace methods are typically mentioned. Indeed, these are also the names of the chapters in the book [2]. The focus in the present book is on the class of methods generally known as optimal filtering, of which the classical Wiener filter is a special case. However, in this book, we will show how speech enhancement using the principles of subspace-based methods can be cast as an optimal filtering problem. As such, the present book unifies what has previously been considered two competing principles of speech enhancement in one framework. As a consequence, it is both possible to combine the benefits of the subspace methods and optimal filtering methods and to analyze and compare the performance of the various approaches analytically.

1.1 History and Applications of Subspace Methods

The development of the subspace-based methods for speech enhancement took a quite different route than the more traditional speech enhancement methods based on the theory of stochastic processes (e.g., linear filtering methods), and it can, therefore, be quite difficult to understand similarities and differences between the methodologies. In that connection, the curious reader might wonder what exactly the distinguishing characteristics of subspace-based enhancement methods are. Subspace-based methods are a class of methods that take their starting point in linear algebra, i.e., they are based on the notions of subspaces and the properties of vectors and matrices. Simply put, they are based on the idea of decomposing the correlation matrix of the observed signal using an eigenvalue-type decomposition and then, from this, find a basis for the part of the space that contains the desired signal (called the signal subspace) and a basis for the part that contains only noise (called the noise subspace).

Subspace methods have a rich history in signal processing, not only for speech enhancement. In fact, much of the early work focused on problems such as parameter estimation, model order estimation, low-rank approximations, etc. Perhaps the earliest example of a subspace method for parameter estimation is Pisarenko’s method [12] for sinusoidal parameter estimation. Later followed more, and probably the most famous, subspace methods for the same problem (although cast as the equivalent problem of determining spatial frequencies in arrays) such as the MUltiple SIgnal Classification (MUSIC) method [13,14] (see also the later papers [15,16]), of which Pisarenko’s method is a special case, and the Estimation of Signal Parameters via Rotational Invariance Techniques (ESPRIT) method [17].

Since then, several variations, improvements, and generalizations have followed, including root MUSIC [18], modified MUSIC [19], min-norm [20], unitary ESPRIT [21], and (weighted) subspace fitting [22] (on this matter, see also the tutorial [23]). MUSIC exploits the orthogonality of the signal and noise subspaces while ESPRIT is based on exploiting the structure of the involved matrices, more specifically, their shift-invariance. In [24], it was shown how the model order can be determined statistically from the ratio between the arithmetic and geometric means of the eigenvalues in combination with model selection criteria (this was essentially based on the same derivations as [15]). Later, the ideas behind subspace methods lead to the more general ideas of reduced-rank signal processing [25] and low-rank adaptive...

Erscheint lt. Verlag	4.1.2014
Sprache	englisch
Themenwelt	Mathematik / Informatik ► Informatik
	Medizin / Pharmazie ► Pflege
	Medizin / Pharmazie ► Physiotherapie / Ergotherapie ► Orthopädie
	Technik ► Bauwesen
	Technik ► Elektrotechnik / Energietechnik
	Technik ► Medizintechnik
ISBN-10	0-12-800253-0 / 0128002530
ISBN-13	978-0-12-800253-7 / 9780128002537

Haben Sie eine Frage zum Produkt?

PDF (Adobe DRM)
Größe: 3,3 MB

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

EPUB (Adobe DRM)
Größe: 3,9 MB

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Print-Ausgabe

Buch | Softcover

56,10 €