Introduction to High-Dimensional Statistics
Apple Academic Press Inc. (Verlag)
978-1-4822-3794-8 (ISBN)
- Titel erscheint in neuer Auflage
- Artikel merken
Introduction to High-Dimensional Statistics is a concise guide to state-of-the-art models, techniques, and approaches for handling high-dimensional data. The book is intended to expose the reader to the key concepts and ideas in the most simple settings possible while avoiding unnecessary technicalities.
Offering a succinct presentation of the mathematical foundations of high-dimensional statistics, this highly accessible text:
Describes the challenges related to the analysis of high-dimensional data
Covers cutting-edge statistical methods including model selection, sparsity and the lasso, aggregation, and learning theory
Provides detailed exercises at the end of every chapter with collaborative solutions on a wikisite
Illustrates concepts with simple but clear practical examples
Introduction to High-Dimensional Statistics is suitable for graduate students and researchers interested in discovering modern statistics for massive data. It can be used as a graduate text or for self-study.
Christophe Giraud was a student of the École Normale Supérieure de Paris, and he received a Ph.D in probability theory from the University Paris 6. He was assistant professor at the University of Nice from 2002 to 2008. He has been associate professor at the École Polytechnique since 2008 and professor at Paris Sud University (Orsay) since 2012. His current research focuses mainly on the statistical theory of high-dimensional data analysis and its applications to life sciences.
Preface
Acknowledgments
Introduction
High-Dimensional Data
Curse of Dimensionality
Lost in the Immensity of High-Dimensional Spaces
Fluctuations Cumulate
An Accumulation of Rare Events May Not Be Rare
Computational Complexity
High-Dimensional Statistics
Circumventing the Curse of Dimensionality
A Paradigm Shift
Mathematics of High-Dimensional Statistics
About This Book
Statistics and Data Analysis
Purpose of This Book
Overview
Discussion and References
Take-Home Message
References
Exercises
Strange Geometry of High-Dimensional Spaces
Volume of a p-Dimensional Ball
Tails of a Standard Gaussian Distribution
Principal Component Analysis
Basics of Linear Regression
Concentration of the Square Norm of a Gaussian Random Variable
Model Selection
Statistical Setting
To Select among a Collection of Models
Models and Oracle
Model Selection Procedures
Risk Bound for Model Selection
Oracle Risk Bound
Optimality
Minimax Optimality
Frontier of Estimation in High Dimensions
Minimal Penalties
Computational Issues
Illustration
An Alternative Point of View on Model Selection
Discussion and References
Take-Home Message
References
Exercises
Orthogonal Design
Risk Bounds for the Different Sparsity Settings
Collections of Nested Models
Segmentation with Dynamic Programming
Goldenshluger-Lepski Method
Minimax Lower Bounds
Aggregation of Estimators
Introduction
Gibbs Mixing of Estimators
Oracle Risk Bound
Numerical Approximation by Metropolis-Hastings
Numerical Illustration
Discussion and References
Take-Home Message
References
Exercises
Gibbs Distribution
Orthonormal Setting with Power Law Prior
Group-Sparse Setting
Gain of Combining
Online Aggregation
Convex Criteria
Reminder on Convex Multivariate Functions
Subdifferentials
Two Useful Properties
Lasso Estimator
Geometric Insights
Analytic Insights
Oracle Risk Bound
Computing the Lasso Estimator
Removing the Bias of the Lasso Estimator
Convex Criteria for Various Sparsity Patterns
Group-Lasso (Group Sparsity)
Sparse-Group Lasso (Sparse-Group Sparsity)
Fused-Lasso (Variation Sparsity)
Discussion and References
Take-Home Message
References
Exercises
When Is the Lasso Solution Unique?
Support Recovery via the Witness Approach
Lower Bound on the Compatibility Constant
On the Group-Lasso
Dantzig Selector
Projection on the l1-Ball
Ridge and Elastic-Net
Estimator Selection
Estimator Selection
Cross-Validation Techniques
Complexity Selection Techniques
Coordinate-Sparse Regression
Group-Sparse Regression
Multiple Structures
Scaled-Invariant Criteria
References and Discussion
Take-Home Message
References
Exercises
Expected V-Fold CV l2-Risk
Proof of Corollary 5.5
Some Properties of Penalty (5.4)
Selecting the Number of Steps for the Forward Algorithm
Multivariate Regression
Statistical Setting
A Reminder on Singular Values
Low-Rank Estimation
If We Knew the Rank of A*
When the Rank of A* Is Unknown
Low Rank and Sparsity
Row-Sparse Matrices
Criterion for Row-Sparse and Low-Rank Matrices
Convex Criterion for Low Rank Matrices
Convex Criterion for Sparse and Low-Rank Matrices
Discussion and References
Take-Home Message
References
Exercises
Hard-Thresholding of the Singular Values
Exact Rank Recovery
Rank Selection with Unknown Variance
Graphical Models
Reminder on Conditional Independence
Graphical Models
Directed Acyclic Graphical Models
Nondirected Models
Gaussian Graphical Models (GGM)
Connection with the Precision Matrix and the Linear Regression
Estimating g by Multiple Testing
Sparse Estimation of the Precision Matrix
Estimation of g by Regression
Practical Issues
Discussion and References
Take-Home Message
References
Exercises
Factorization in Directed Models
Moralization of a Directed Graph
Convexity of -log(det(K))
Block Gradient Descent with the l1 / l2 Penalty
Gaussian Graphical Models with Hidden Variables
Dantzig Estimation of Sparse Gaussian Graphical Models
Gaussian Copula Graphical Models
Restricted Isometry Constant for Gaussian Matrices
Multiple Testing
An Introductory Example
Differential Expression of a Single Gene
Differential Expression of Multiple Genes
Statistical Setting
p-Values
Multiple Testing Setting
Bonferroni Correction
Controlling the False Discovery Rate
Heuristics
Step-Up Procedures
FDR Control under the WPRDS Property
Illustration
Discussion and References
Take-Home Message
References
Exercises
FDR versus FWER
WPRDS Property
Positively Correlated Normal Test Statistics
Supervised Classification
Statistical Modeling
Bayes Classifier
Parametric Modeling
Semi-Parametric Modeling
Nonparametric Modeling
Empirical Risk Minimization
Misclassification Probability of the Empirical Risk Minimizer
Vapnik-Chervonenkis Dimension
Dictionary Selection
From Theoretical to Practical Classifiers
Empirical Risk Convexification
Statistical Properties
Support Vector Machines
AdaBoost
Classifier Selection
Discussion and References
Take-Home Message
References
Exercises
Linear Discriminant Analysis
VC Dimension of Linear Classifiers in Rd
Linear Classifiers with Margin Constraints
Spectral Kernel
Computation of the SVM Classifier
Kernel Principal Component Analysis (KPCA)
Gaussian Distribution
Gaussian Random Vectors
Chi-Square Distribution
Gaussian Conditioning
Probabilistic Inequalities
Basic Inequalities
Concentration Inequalities
McDiarmid Inequality
Gaussian Concentration Inequality
Symmetrization and Contraction Lemmas
Symmetrization Lemma
Contraction Principle
Birge's Inequality
Linear Algebra
Singular Value Decomposition (SVD)
Moore-Penrose Pseudo-Inverse
Matrix Norms
Matrix Analysis
Subdifferentials of Convex Functions
Subdifferentials and Subgradients
Examples of Subdifferentials
Reproducing Kernel Hilbert Spaces
Notations
Bibliography
Index
Erscheint lt. Verlag | 7.1.2015 |
---|---|
Reihe/Serie | Chapman & Hall/CRC Monographs on Statistics and Applied Probability |
Zusatzinfo | scatter color on pages 2, 186, 204 and 206; 2 Tables, black and white; 33 Illustrations, black and white |
Verlagsort | Oakville |
Sprache | englisch |
Maße | 156 x 234 mm |
Gewicht | 635 g |
Themenwelt | Mathematik / Informatik ► Informatik ► Datenbanken |
Mathematik / Informatik ► Informatik ► Theorie / Studium | |
Mathematik / Informatik ► Mathematik ► Statistik | |
Technik ► Elektrotechnik / Energietechnik | |
Wirtschaft ► Volkswirtschaftslehre ► Ökonometrie | |
ISBN-10 | 1-4822-3794-6 / 1482237946 |
ISBN-13 | 978-1-4822-3794-8 / 9781482237948 |
Zustand | Neuware |
Haben Sie eine Frage zum Produkt? |
aus dem Bereich