Advanced Video Coding: Principles and Techniques - D. Chai, T. Meier, K.N. Ngan

Advanced Video Coding: Principles and Techniques (eBook)

The Content-based Approach

D. Chai, T. Meier, K.N. Ngan (Autoren)

eBook Download: PDF | EPUB

1999 | 1. Auflage
411 Seiten
Elsevier Science (Verlag)
978-0-08-049873-7 (ISBN)

In recent years, the paradigm of video coding has shifted from that of a frame-based approach to a content-based approach, particularly with the finalization of the ISO multimedia coding standard, MPEG-4. MPEG-4 is the emerging standard for the coding of multimedia content. It defines a syntax for a set of content-based functionalities, namely, content-based interactivity, compression and universal access. However, it does not specify how the video content is to be generated. To generate the video content, video has to be segmented into video objects and tracked as they transverse across the video frames. This book addresses the difficult problem of video segmentation, and the extraction and tracking of video object planes as defined in MPEG-4. It then focuses on the specific issue of face segmentation and coding as applied to videoconferencing in order to improve the quality of videoconferencing images especially in the facial region.
Modal-based coding is a content-based coding technique used to code synthetic objects that have become an important part of video content. It results in extremely low bit rates because only the parameters needed to represent the modal are transmitted. Model-based coding is included to provide background information for the synthetic object coding in MPEG-4. Lastly, MPEG-4, the first coding standard for multimedia content is described in detail. The topics covered include the coding of audio objects, the coding of natural and synthetic video objects, and error resilience.
Advanced Video Coding is one of the first books on content-based coding and MPEG-4 coding standard. It serves as an excellent information source and reference for both researchers and practicing engineers.

In recent years, the paradigm of video coding has shifted from that of a frame-based approach to a content-based approach, particularly with the finalization of the ISO multimedia coding standard, MPEG-4. MPEG-4 is the emerging standard for the coding of multimedia content. It defines a syntax for a set of content-based functionalities, namely, content-based interactivity, compression and universal access. However, it does not specify how the video content is to be generated. To generate the video content, video has to be segmented into video objects and tracked as they transverse across the video frames. This book addresses the difficult problem of video segmentation, and the extraction and tracking of video object planes as defined in MPEG-4. It then focuses on the specific issue of face segmentation and coding as applied to videoconferencing in order to improve the quality of videoconferencing images especially in the facial region. Modal-based coding is a content-based coding technique used to code synthetic objects that have become an important part of video content. It results in extremely low bit rates because only the parameters needed to represent the modal are transmitted. Model-based coding is included to provide background information for the synthetic object coding in MPEG-4. Lastly, MPEG-4, the first coding standard for multimedia content is described in detail. The topics covered include the coding of audio objects, the coding of natural and synthetic video objects, and error resilience. Advanced Video Coding is one of the first books on content-based coding and MPEG-4 coding standard. It serves as an excellent information source and reference for both researchers and practicing engineers.

Front Cover 1
Advanced Video Coding: Principles and Techniques 4
Copyright Page 5
Table of Contents 14
Preface 8
Acknowledgments 12
Chapter 1. Image and Video Segmentation 20
1.1 Bayesian Inference and MRF's 21
1.2 Edge Detection 34
1.3 Image Segmentation 39
1.4 Motion 51
1.5 Motion Estimation 60
1.6 Motion Segmentation 68
References 79
Chapter 2. Face Segmentation 88
2.1 Face Segmentation Problem 88
2.2 Various Approaches 89
2.3 Applications 93
2.4 Modeling of Human Skin Color 98
2.5 Skin Color Map Approach 104
References 126
Chapter 3. Foreground/Background Coding 132
3.1 Introduction 132
3.2 Related Works 135
3.3 Foreground and Background Regions 141
3.4 Content-based Bit Allocation 142
3.5 Content-based Rate Control 150
3.6 H.261FB Approach 151
3.7 H.263FB Approach 184
3.8 Towards MPEG-4 Video Coding 190
References 200
Chapter 4. Model-Based Coding 202
4.1 Introduction 202
4.2 3-D Human Facial Modeling 206
4.3 Facial Feature Contours Extraction 212
4.4 WFM Fitting and Adaptation 239
4.5 Analysis of Facial Image Sequences 246
4.6 Synthesis of Facial Image Sequences 253
4.7 Update of 3-D Facial Model 256
References 264
Chapter 5. VOP Extraction and Tracking 270
5.1 Video Object Plane Extraction Techniques 270
5.2 Outline of VOP Extraction Algorithm 277
5.3 Version I: Morphological Motion Filtering 279
5.4 Version II: Change Detection Masks 316
References 329
Chapter 6. MPEG-4 Standard 334
6.1 Introduction 334
6.2 MPEG-4 Development Process 334
6.3 Features of the MPEG-4 Standard [2] 335
6.4 Technical Description of the MPEG-4 Standard 340
6.5 Coding of Audio Objects 345
6.6 Coding of Natural Visual Objects 348
6.7 Coding of Synthetic Objects 410
6.8 Error Resilience 414
References 419
Index 420

Chapter 1

Image and Video Segmentation

A. Eleftheriadis; A. Jacquin

Segmentation plays a crucial role in second-generation image and video coding schemes, as well as in content-based video coding. It is one of the most difficult tasks in image processing, and it often determines the eventual success or failure of a system.

Broadly speaking, segmentation seeks to subdivide images into regions of similar attribute. Some of the most fundamental attributes are luminance, color, and optical flow. They result in a so-called low-level segmentation, because the partitions consist of primitive regions that usually do not have a one-to-one correspondence with physical objects.

Sometimes, images must be divided into physical objects so that each region constitutes a semantically meaningful entity. This higher-level segmentation is generally more difficult, and it requires contextual information or some form of artificial intelligence. Compared to low-level segmentation, far less research has been undertaken in this field.

Both low-level and higher-level segmentation are becoming increasingly important in image and video coding. The level at which the partitioning is carried out depends on the application. So-called second generation coding schemes [1, 2] employ fairly sophisticated source models that take into account the characteristics of the human visual system. Images are first partitioned into regions of similar intensity, color, or motion characteristics. Each region is then separately and efficiently encoded, leading to less artifacts than systems based on the discrete cosine transform (DCT) [3, 4, 5]. The second-generation approach has initiated the development of a significant number of segmentation and coding algorithms [6, 7, 8, 9, 10], which are based on a low-level segmentation.

The new video coding standard MPEG-4 [11, 12], on the other hand, targets more than just large coding gains. To provide new functionalities for future multimedia applications, such as content-based interactivity and content-based scalability, it introduces a content-based representation. Scenes are treated as compositions of several semantically meaningful objects, which are separately encoded and decoded. Obviously, MPEG-4 requires a prior decomposition of the scene into physical objects or so-called video object planes (VOPs). This corresponds to a higher-level partition.

As opposed to the intensity or motion-based segmentation for the second-generation techniques, there does not exist a low-level feature that can be utilized for grouping pixels into semantically meaningful objects. As a consequence, VOP segmentation is generally far more difficult than low-level segmentation. Furthermore, VOP extraction for content-based interactivity functionalities is an unforgiving task. Even small errors in the contour can render a VOP useless for such applications.

This chapter starts with a review of Bayesian inference and Markov random fields (MRFs), which will be needed throughout this chapter. A brief discussion of edge detection is given in Section 1.2, and Section 1.3 deals with low-level still image segmentation. The remaining three sections are devoted to video segmentation. First, an introduction to motion and motion estimation is given in Sections 1.4 and 1.5, before video segmentation techniques are examined in Sections 1.6 and Section 5.1. For a review of VOP segmentation algorithms, we refer the reader to Chapter 5.

1.1 Bayesian Inference and Markov Random Fields

Bayesian inference is among the most popular and powerful tools in image processing and computer vision [13, 14, 15]. The basis of Bayesian techniques is the famous inversion formula

X|O=PO|XPXPO.

(1.1)

Although equation (1.1) is trivial to derive using the axioms of probability theory, it represents a major concept. To understand this better, let X denote an unknown parameter and O an observation that provides some information about X. In the context of decision making, X and O are sometimes referred to as hypothesis and evidence, respectively.

P(X|O) can now be viewed as the likelihood of the unknown parameter X, given the observation O. The inversion formula (1.1) enables us to express P(X|O) in terms of P(O|X) and P(X). In contrast to the posteriorprobability P(X|O), which is normally very difficult to establish, P(O|X) and the prior probability P(X) are intuitively easier to understand and can usually be determined on a theoretical, experimental, or subjective basis [13, 14]. Bayes’ theorem (1.1) can also be seen as an updating of the probability of X from P(X) to P(X|O) after observing the evidence O[14].

1.1.1 MAP Estimation

Undoubtedly, the maximum a posteriori (MAP) estimator is the most important Bayesian tool. It aims at maximizing P(X|O) with respect to X, which is equivalent to maximizing the numerator on the right-hand side of (1.1), because P(O) does not depend on X. Hence, we can write

X|O∝PO|XPX.

(1.2)

For the purpose of a simplified notation, it is often more convenient to minimize the negative logarithm of P(X|O) instead of maximizing P(X|O) directly. However, this has no effect on the outcome of the estimation. The MAP estimate of X is now given by

MAP=argmaxXPO|XPX=argminX–logPO|X–logPX.

(1.3)

From (1.3) it can be seen that the knowledge of two probability functions is required. The likelihood P(X) contains the information that is available a priori, that is, it describes our prior expectation on X before knowing O. While it is often possible to determine P(X) from theoretical or experimental knowledge, subjective experience sometimes plays an important role. As we will see later, Gibbs distributions are by far the most popular choice for P(X) in image processing, which means that X is assumed to be a sample of a Markov random field (MRF).

The conditional probability P(O|X), on the other hand, defines how well X explains the observation O and can therefore be viewed as an observation model. It updates the a priori information contained in P(X) and is often derived from theoretical or experimental knowledge. For example, assume we wanted to recover the unknown original image X from a blurred image O. The probability P(O|X), which describes the degradation process leading to O, could be determined based on theoretical considerations. To this end, a suitable mathematical model for blurring would be needed.

The major conceptual step introduced by Bayesian inference, besides the inversion principle, is to model uncertainty about the unknown parameter Xby probabilities and combining them according to the axioms of probability theory. Indeed, the language of probabilities has proven to be a powerful tool to allow a quantitative treatment of uncertainty that conforms well with human intuition. The resulting distribution P(X|O), after combining prior knowledge and observations, is then the a posteriori belief in X and forms the basis for inferences.

To summarize, by combining P(X) and P(O|X) the MAP estimator incorporates both the a priori information on the unknown parameter X that is available from knowledge and experience and the information brought in by the observation O[16].

Estimation problems are frequently encountered in image processing and computer vision. Applications include image and video segmentation [16, 17, 18, 19], where O represents an image or a video sequence and X is the segmentation label field to be estimated. In image restoration [20, 21, 22], X is the unknown original image we would like to recover and O the degraded image. Bayesian inference is also popular in motion estimation [23, 24, 25, 26], with X denoting the unknown optical flow field and O containing two or more frames of a video sequence. In all these examples, the unknown parameter X is modeled by a random field.

1.1.2 Markov Random Fields (MRFs)

Without doubt the most important statistical signal models in image processing and computer vision are based on Markov processes [27, 20, 28, 29]. Due to their ability to represent the...

Erscheint lt. Verlag	31.8.1999
Sprache	englisch
Themenwelt	Informatik ► Grafik / Design ► Digitale Bildverarbeitung
	Mathematik / Informatik ► Informatik ► Theorie / Studium
	Naturwissenschaften ► Physik / Astronomie ► Elektrodynamik
	Technik ► Bauwesen
	Technik ► Elektrotechnik / Energietechnik
	Technik ► Nachrichtentechnik
ISBN-10	0-08-049873-6 / 0080498736
ISBN-13	978-0-08-049873-7 / 9780080498737

Haben Sie eine Frage zum Produkt?

PDF (Adobe DRM)
Größe: 22,5 MB

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

EPUB (Adobe DRM)
Größe: 11,7 MB

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.

Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Print-Ausgabe

Buch | Hardcover

137,15 €