Introduction to High-Dimensional Statistics - Christophe Giraud

Introduction to High-Dimensional Statistics

Buch | Hardcover
270 Seiten
2015
Apple Academic Press Inc. (Verlag)
978-1-4822-3794-8 (ISBN)
78,55 inkl. MwSt
zur Neuauflage
  • Titel erscheint in neuer Auflage
  • Artikel merken
Zu diesem Artikel existiert eine Nachauflage
Ever-greater computing technologies have given rise to an exponentially growing volume of data. Today massive data sets (with potentially thousands of variables) play an important role in almost every branch of modern human activity, including networks, finance, and genetics. However, analyzing such data has presented a challenge for statisticians and data analysts and has required the development of new statistical methods capable of separating the signal from the noise.


Introduction to High-Dimensional Statistics is a concise guide to state-of-the-art models, techniques, and approaches for handling high-dimensional data. The book is intended to expose the reader to the key concepts and ideas in the most simple settings possible while avoiding unnecessary technicalities.


Offering a succinct presentation of the mathematical foundations of high-dimensional statistics, this highly accessible text:




Describes the challenges related to the analysis of high-dimensional data
Covers cutting-edge statistical methods including model selection, sparsity and the lasso, aggregation, and learning theory
Provides detailed exercises at the end of every chapter with collaborative solutions on a wikisite
Illustrates concepts with simple but clear practical examples


Introduction to High-Dimensional Statistics is suitable for graduate students and researchers interested in discovering modern statistics for massive data. It can be used as a graduate text or for self-study.

Christophe Giraud was a student of the École Normale Supérieure de Paris, and he received a Ph.D in probability theory from the University Paris 6. He was assistant professor at the University of Nice from 2002 to 2008. He has been associate professor at the École Polytechnique since 2008 and professor at Paris Sud University (Orsay) since 2012. His current research focuses mainly on the statistical theory of high-dimensional data analysis and its applications to life sciences.

Preface


Acknowledgments


Introduction


High-Dimensional Data


Curse of Dimensionality


Lost in the Immensity of High-Dimensional Spaces


Fluctuations Cumulate


An Accumulation of Rare Events May Not Be Rare


Computational Complexity


High-Dimensional Statistics


Circumventing the Curse of Dimensionality


A Paradigm Shift


Mathematics of High-Dimensional Statistics


About This Book


Statistics and Data Analysis


Purpose of This Book


Overview


Discussion and References


Take-Home Message


References


Exercises


Strange Geometry of High-Dimensional Spaces


Volume of a p-Dimensional Ball


Tails of a Standard Gaussian Distribution


Principal Component Analysis


Basics of Linear Regression


Concentration of the Square Norm of a Gaussian Random Variable


Model Selection


Statistical Setting


To Select among a Collection of Models


Models and Oracle


Model Selection Procedures


Risk Bound for Model Selection


Oracle Risk Bound


Optimality


Minimax Optimality


Frontier of Estimation in High Dimensions


Minimal Penalties


Computational Issues


Illustration


An Alternative Point of View on Model Selection


Discussion and References


Take-Home Message


References


Exercises


Orthogonal Design


Risk Bounds for the Different Sparsity Settings


Collections of Nested Models


Segmentation with Dynamic Programming


Goldenshluger-Lepski Method


Minimax Lower Bounds


Aggregation of Estimators


Introduction


Gibbs Mixing of Estimators


Oracle Risk Bound


Numerical Approximation by Metropolis-Hastings


Numerical Illustration


Discussion and References


Take-Home Message


References


Exercises


Gibbs Distribution


Orthonormal Setting with Power Law Prior


Group-Sparse Setting


Gain of Combining


Online Aggregation


Convex Criteria


Reminder on Convex Multivariate Functions


Subdifferentials


Two Useful Properties


Lasso Estimator


Geometric Insights


Analytic Insights


Oracle Risk Bound


Computing the Lasso Estimator


Removing the Bias of the Lasso Estimator


Convex Criteria for Various Sparsity Patterns


Group-Lasso (Group Sparsity)


Sparse-Group Lasso (Sparse-Group Sparsity)


Fused-Lasso (Variation Sparsity)


Discussion and References


Take-Home Message


References


Exercises


When Is the Lasso Solution Unique?


Support Recovery via the Witness Approach


Lower Bound on the Compatibility Constant


On the Group-Lasso


Dantzig Selector


Projection on the l1-Ball


Ridge and Elastic-Net


Estimator Selection


Estimator Selection


Cross-Validation Techniques


Complexity Selection Techniques


Coordinate-Sparse Regression


Group-Sparse Regression


Multiple Structures


Scaled-Invariant Criteria


References and Discussion


Take-Home Message


References


Exercises


Expected V-Fold CV l2-Risk


Proof of Corollary 5.5


Some Properties of Penalty (5.4)


Selecting the Number of Steps for the Forward Algorithm


Multivariate Regression


Statistical Setting


A Reminder on Singular Values


Low-Rank Estimation


If We Knew the Rank of A*


When the Rank of A* Is Unknown


Low Rank and Sparsity


Row-Sparse Matrices


Criterion for Row-Sparse and Low-Rank Matrices


Convex Criterion for Low Rank Matrices


Convex Criterion for Sparse and Low-Rank Matrices


Discussion and References


Take-Home Message


References


Exercises


Hard-Thresholding of the Singular Values


Exact Rank Recovery


Rank Selection with Unknown Variance


Graphical Models


Reminder on Conditional Independence


Graphical Models


Directed Acyclic Graphical Models


Nondirected Models


Gaussian Graphical Models (GGM)


Connection with the Precision Matrix and the Linear Regression


Estimating g by Multiple Testing


Sparse Estimation of the Precision Matrix


Estimation of g by Regression


Practical Issues


Discussion and References


Take-Home Message


References


Exercises


Factorization in Directed Models


Moralization of a Directed Graph


Convexity of -log(det(K))


Block Gradient Descent with the l1 / l2 Penalty


Gaussian Graphical Models with Hidden Variables


Dantzig Estimation of Sparse Gaussian Graphical Models


Gaussian Copula Graphical Models


Restricted Isometry Constant for Gaussian Matrices


Multiple Testing


An Introductory Example


Differential Expression of a Single Gene


Differential Expression of Multiple Genes


Statistical Setting


p-Values


Multiple Testing Setting


Bonferroni Correction


Controlling the False Discovery Rate


Heuristics


Step-Up Procedures


FDR Control under the WPRDS Property


Illustration


Discussion and References


Take-Home Message


References


Exercises


FDR versus FWER


WPRDS Property


Positively Correlated Normal Test Statistics


Supervised Classification


Statistical Modeling


Bayes Classifier


Parametric Modeling


Semi-Parametric Modeling


Nonparametric Modeling


Empirical Risk Minimization


Misclassification Probability of the Empirical Risk Minimizer


Vapnik-Chervonenkis Dimension


Dictionary Selection


From Theoretical to Practical Classifiers


Empirical Risk Convexification


Statistical Properties


Support Vector Machines


AdaBoost


Classifier Selection


Discussion and References


Take-Home Message


References


Exercises


Linear Discriminant Analysis


VC Dimension of Linear Classifiers in Rd


Linear Classifiers with Margin Constraints


Spectral Kernel


Computation of the SVM Classifier


Kernel Principal Component Analysis (KPCA)


Gaussian Distribution


Gaussian Random Vectors


Chi-Square Distribution


Gaussian Conditioning


Probabilistic Inequalities


Basic Inequalities


Concentration Inequalities


McDiarmid Inequality


Gaussian Concentration Inequality


Symmetrization and Contraction Lemmas


Symmetrization Lemma


Contraction Principle


Birge's Inequality


Linear Algebra


Singular Value Decomposition (SVD)


Moore-Penrose Pseudo-Inverse


Matrix Norms


Matrix Analysis


Subdifferentials of Convex Functions


Subdifferentials and Subgradients


Examples of Subdifferentials


Reproducing Kernel Hilbert Spaces


Notations


Bibliography


Index

Erscheint lt. Verlag 7.1.2015
Reihe/Serie Chapman & Hall/CRC Monographs on Statistics and Applied Probability
Zusatzinfo scatter color on pages 2, 186, 204 and 206; 2 Tables, black and white; 33 Illustrations, black and white
Verlagsort Oakville
Sprache englisch
Maße 156 x 234 mm
Gewicht 635 g
Themenwelt Mathematik / Informatik Informatik Datenbanken
Mathematik / Informatik Informatik Theorie / Studium
Mathematik / Informatik Mathematik Statistik
Technik Elektrotechnik / Energietechnik
Wirtschaft Volkswirtschaftslehre Ökonometrie
ISBN-10 1-4822-3794-6 / 1482237946
ISBN-13 978-1-4822-3794-8 / 9781482237948
Zustand Neuware
Haben Sie eine Frage zum Produkt?
Mehr entdecken
aus dem Bereich
Der Grundkurs für Ausbildung und Praxis

von Ralf Adams

Buch (2023)
Carl Hanser (Verlag)
29,99