Introduction to High-Dimensional Statistics

Christophe Giraud (Autor)

Buch | Hardcover

270 Seiten

2015
Apple Academic Press Inc. (Verlag)
978-1-4822-3794-8 (ISBN)

Titel erscheint in neuer Auflage

Artikel merken

Zu diesem Artikel existiert eine Nachauflage

Introduction to High-Dimensional Statistics

Christophe Giraud

2021

Buch | Hardcover

95, ⁹⁵ €

zur Neuauflage

Ever-greater computing technologies have given rise to an exponentially growing volume of data. Today massive data sets (with potentially thousands of variables) play an important role in almost every branch of modern human activity, including networks, finance, and genetics. However, analyzing such data has presented a challenge for statisticians and data analysts and has required the development of new statistical methods capable of separating the signal from the noise.

Introduction to High-Dimensional Statistics is a concise guide to state-of-the-art models, techniques, and approaches for handling high-dimensional data. The book is intended to expose the reader to the key concepts and ideas in the most simple settings possible while avoiding unnecessary technicalities.

Offering a succinct presentation of the mathematical foundations of high-dimensional statistics, this highly accessible text:

Describes the challenges related to the analysis of high-dimensional data
Covers cutting-edge statistical methods including model selection, sparsity and the lasso, aggregation, and learning theory
Provides detailed exercises at the end of every chapter with collaborative solutions on a wikisite
Illustrates concepts with simple but clear practical examples

Introduction to High-Dimensional Statistics is suitable for graduate students and researchers interested in discovering modern statistics for massive data. It can be used as a graduate text or for self-study.

Christophe Giraud was a student of the École Normale Supérieure de Paris, and he received a Ph.D in probability theory from the University Paris 6. He was assistant professor at the University of Nice from 2002 to 2008. He has been associate professor at the École Polytechnique since 2008 and professor at Paris Sud University (Orsay) since 2012. His current research focuses mainly on the statistical theory of high-dimensional data analysis and its applications to life sciences.

Preface

Acknowledgments

Introduction

High-Dimensional Data

Curse of Dimensionality

Lost in the Immensity of High-Dimensional Spaces

Fluctuations Cumulate

An Accumulation of Rare Events May Not Be Rare

Computational Complexity

High-Dimensional Statistics

Circumventing the Curse of Dimensionality

A Paradigm Shift

Mathematics of High-Dimensional Statistics

About This Book

Statistics and Data Analysis

Purpose of This Book

Overview

Discussion and References

Take-Home Message

References

Exercises

Strange Geometry of High-Dimensional Spaces

Volume of a p-Dimensional Ball

Tails of a Standard Gaussian Distribution

Principal Component Analysis

Basics of Linear Regression

Concentration of the Square Norm of a Gaussian Random Variable

Model Selection

Statistical Setting

To Select among a Collection of Models

Models and Oracle

Model Selection Procedures

Risk Bound for Model Selection

Oracle Risk Bound

Optimality

Minimax Optimality

Frontier of Estimation in High Dimensions

Minimal Penalties

Computational Issues

Illustration

An Alternative Point of View on Model Selection

Discussion and References

Take-Home Message

References

Exercises

Orthogonal Design

Risk Bounds for the Different Sparsity Settings

Collections of Nested Models

Segmentation with Dynamic Programming

Goldenshluger-Lepski Method

Minimax Lower Bounds

Aggregation of Estimators

Introduction

Gibbs Mixing of Estimators

Oracle Risk Bound

Numerical Approximation by Metropolis-Hastings

Numerical Illustration

Discussion and References

Take-Home Message

References

Exercises

Gibbs Distribution

Orthonormal Setting with Power Law Prior

Group-Sparse Setting

Gain of Combining

Online Aggregation

Convex Criteria

Reminder on Convex Multivariate Functions

Subdifferentials

Two Useful Properties

Lasso Estimator

Geometric Insights

Analytic Insights

Oracle Risk Bound

Computing the Lasso Estimator

Removing the Bias of the Lasso Estimator

Convex Criteria for Various Sparsity Patterns

Group-Lasso (Group Sparsity)

Sparse-Group Lasso (Sparse-Group Sparsity)

Fused-Lasso (Variation Sparsity)

Discussion and References

Take-Home Message

References

Exercises

When Is the Lasso Solution Unique?

Support Recovery via the Witness Approach

Lower Bound on the Compatibility Constant

On the Group-Lasso

Dantzig Selector

Projection on the l1-Ball

Ridge and Elastic-Net

Estimator Selection

Estimator Selection

Cross-Validation Techniques

Complexity Selection Techniques

Coordinate-Sparse Regression

Group-Sparse Regression

Multiple Structures

Scaled-Invariant Criteria

References and Discussion

Take-Home Message

References

Exercises

Expected V-Fold CV l2-Risk

Proof of Corollary 5.5

Some Properties of Penalty (5.4)

Selecting the Number of Steps for the Forward Algorithm

Multivariate Regression

Statistical Setting

A Reminder on Singular Values

Low-Rank Estimation

If We Knew the Rank of A*

When the Rank of A* Is Unknown

Low Rank and Sparsity

Row-Sparse Matrices

Criterion for Row-Sparse and Low-Rank Matrices

Convex Criterion for Low Rank Matrices

Convex Criterion for Sparse and Low-Rank Matrices

Discussion and References

Take-Home Message

References

Exercises

Hard-Thresholding of the Singular Values

Exact Rank Recovery

Rank Selection with Unknown Variance

Graphical Models

Reminder on Conditional Independence

Graphical Models

Directed Acyclic Graphical Models

Nondirected Models

Gaussian Graphical Models (GGM)

Connection with the Precision Matrix and the Linear Regression

Estimating g by Multiple Testing

Sparse Estimation of the Precision Matrix

Estimation of g by Regression

Practical Issues

Discussion and References

Take-Home Message

References

Exercises

Factorization in Directed Models

Moralization of a Directed Graph

Convexity of -log(det(K))

Block Gradient Descent with the l1 / l2 Penalty

Gaussian Graphical Models with Hidden Variables

Dantzig Estimation of Sparse Gaussian Graphical Models

Gaussian Copula Graphical Models

Restricted Isometry Constant for Gaussian Matrices

Multiple Testing

An Introductory Example

Differential Expression of a Single Gene

Differential Expression of Multiple Genes

Statistical Setting

p-Values

Multiple Testing Setting

Bonferroni Correction

Controlling the False Discovery Rate

Heuristics

Step-Up Procedures

FDR Control under the WPRDS Property

Illustration

Discussion and References

Take-Home Message

References

Exercises

FDR versus FWER

WPRDS Property

Positively Correlated Normal Test Statistics

Supervised Classification

Statistical Modeling

Bayes Classifier

Parametric Modeling

Semi-Parametric Modeling

Nonparametric Modeling

Empirical Risk Minimization

Misclassification Probability of the Empirical Risk Minimizer

Vapnik-Chervonenkis Dimension

Dictionary Selection

From Theoretical to Practical Classifiers

Empirical Risk Convexification

Statistical Properties

Support Vector Machines

AdaBoost

Classifier Selection

Discussion and References

Take-Home Message

References

Exercises

Linear Discriminant Analysis

VC Dimension of Linear Classifiers in Rd

Linear Classifiers with Margin Constraints

Spectral Kernel

Computation of the SVM Classifier

Kernel Principal Component Analysis (KPCA)

Gaussian Distribution

Gaussian Random Vectors

Chi-Square Distribution

Gaussian Conditioning

Probabilistic Inequalities

Basic Inequalities

Concentration Inequalities

McDiarmid Inequality

Gaussian Concentration Inequality

Symmetrization and Contraction Lemmas

Symmetrization Lemma

Contraction Principle

Birge's Inequality

Linear Algebra

Singular Value Decomposition (SVD)

Moore-Penrose Pseudo-Inverse

Matrix Norms

Matrix Analysis

Subdifferentials of Convex Functions

Subdifferentials and Subgradients

Examples of Subdifferentials

Reproducing Kernel Hilbert Spaces

Notations

Bibliography

Index

Erscheint lt. Verlag	7.1.2015
Reihe/Serie	Chapman & Hall/CRC Monographs on Statistics and Applied Probability
Zusatzinfo	scatter color on pages 2, 186, 204 and 206; 2 Tables, black and white; 33 Illustrations, black and white
Verlagsort	Oakville
Sprache	englisch
Maße	156 x 234 mm
Gewicht	635 g
Themenwelt	Mathematik / Informatik ► Informatik ► Datenbanken
	Mathematik / Informatik ► Informatik ► Theorie / Studium
	Mathematik / Informatik ► Mathematik ► Statistik
	Technik ► Elektrotechnik / Energietechnik
	Wirtschaft ► Volkswirtschaftslehre ► Ökonometrie
ISBN-10	1-4822-3794-6 / 1482237946
ISBN-13	978-1-4822-3794-8 / 9781482237948
Zustand	Neuware