Minimum Divergence Methods in Statistical Machine Learning -  Shinto Eguchi,  Osamu Komori

Minimum Divergence Methods in Statistical Machine Learning (eBook)

From an Information Geometric Viewpoint
eBook Download: PDF
2022 | 1. Auflage
X, 221 Seiten
Springer Japan (Verlag)
978-4-431-56922-0 (ISBN)
Systemvoraussetzungen
128,39 inkl. MwSt
  • Download sofort lieferbar
  • Zahlungsarten anzeigen

This book explores minimum divergence methods of statistical machine learning for estimation,  regression, prediction, and so forth,  in which we engage in information geometry to elucidate their intrinsic properties of the corresponding loss functions, learning algorithms, and statistical models. One of the most elementary  examples is Gauss's least squares estimator in a linear regression model, in which the estimator is given by minimization of the sum of squares between a response vector and a vector of the linear subspace hulled by explanatory vectors.  This is extended to Fisher's maximum likelihood estimator (MLE) for an exponential model, in which the estimator is provided by minimization of the Kullback-Leibler (KL) divergence between a data distribution and a parametric distribution of the exponential model in an empirical analogue. Thus, we envisage a geometric interpretation of such  minimization procedures such that a right triangle is kept with Pythagorean identity in the sense of the KL divergence.  This understanding sublimates  a dualistic interplay between a statistical estimation and model, which requires dual geodesic paths, called m-geodesic and e-geodesic paths, in a framework of information geometry.
We extend such a dualistic structure of the MLE and exponential model to that of the minimum divergence estimator and the maximum entropy model, which is applied to robust statistics, maximum entropy, density estimation, principal component analysis, independent component analysis, regression analysis, manifold learning, boosting algorithm,  clustering, dynamic treatment regimes, and so forth. We consider a variety of information divergence measures typically including KL divergence to express departure from one probability distribution to another. An information divergence is decomposed into the cross-entropy and the (diagonal) entropy in which the entropy associates with a generative model as a family of maximum entropy distributions; the cross entropy associates with a statistical estimation method via minimization of the empirical analogue based on given data. Thus any statistical divergence includes an intrinsic object between the generative model and the estimation method. Typically, KL divergence leads to the exponential model and the maximum likelihood estimation. It is shown that any information divergence leads to a Riemannian metric and a pair of the linear connections in the framework of information geometry.
We focus on a class of information divergence generated by an increasing and convex function U, called U-divergence. It is shown that any generator function U generates the U-entropy and U-divergence, in which there is a dualistic structure between the U-divergence method and the maximum U-entropy model. We observe that a specific choice of  U leads to a robust statistical procedure via the minimum U-divergence method. If U is selected as an exponential function, then the corresponding  U-entropy and U-divergence are reduced to the Boltzmann-Shanon entropy and the KL divergence; the minimum U-divergence estimator is equivalent to the MLE. For robust supervised learning to predict a class label we observe that the U-boosting algorithm performs well for contamination of mislabel examples if U is appropriately selected. We present such maximal U-entropy and minimum U-divergence methods, in particular, selecting a power function as U to provide flexible performance in statistical machine learning.

 




Shinto Eguchi is currently Professor at the Institute of Statistical Mathematics.

Osamu Komori is Associate professor at the Seikei University.

This book explores minimum divergence methods of statistical machine learning for estimation,  regression, prediction, and so forth,  in which we engage in information geometry to elucidate their intrinsic properties of the corresponding loss functions, learning algorithms, and statistical models. One of the most elementary  examples is Gauss's least squares estimator in a linear regression model, in which the estimator is given by minimization of the sum of squares between a response vector and a vector of the linear subspace hulled by explanatory vectors.  This is extended to Fisher's maximum likelihood estimator (MLE) for an exponential model, in which the estimator is provided by minimization of the Kullback-Leibler (KL) divergence between a data distribution and a parametric distribution of the exponential model in an empirical analogue. Thus, we envisage a geometric interpretation of such  minimization procedures such that a right triangle is kept with Pythagorean identity in the sense of the KL divergence.  This understanding sublimates  a dualistic interplay between a statistical estimation and model, which requires dual geodesic paths, called m-geodesic and e-geodesic paths, in a framework of information geometry. We extend such a dualistic structure of the MLE and exponential model to that of the minimum divergence estimator and the maximum entropy model, which is applied to robust statistics, maximum entropy, density estimation, principal component analysis, independent component analysis, regression analysis, manifold learning, boosting algorithm,  clustering, dynamic treatment regimes, and so forth. We consider a variety of information divergence measures typically including KL divergence to express departure from one probability distribution to another. An information divergence is decomposed into the cross-entropy and the (diagonal) entropy in which the entropy associates with a generative model as a family of maximum entropy distributions; the cross entropy associates with a statistical estimation method via minimization of the empirical analogue based on given data. Thus any statistical divergence includes an intrinsic object between the generative model and the estimation method. Typically, KL divergence leads to the exponential model and the maximum likelihood estimation. It is shown that any information divergence leads to a Riemannian metric and a pair of the linear connections in the framework of information geometry. We focus on a class of information divergence generated by an increasing and convex function U, called U-divergence. It is shown that any generator function U generates the U-entropy and U-divergence, in which there is a dualistic structure between the U-divergence method and the maximum U-entropy model. We observe that a specific choice of  U leads to a robust statistical procedurevia the minimum U-divergence method. If U is selected as an exponential function, then the corresponding  U-entropy and U-divergence are reduced to the Boltzmann-Shanon entropy and the KL divergence; the minimum U-divergence estimator is equivalent to the MLE. For robust supervised learning to predict a class label we observe that the U-boosting algorithm performs well for contamination of mislabel examples if U is appropriately selected. We present such maximal U-entropy and minimum U-divergence methods, in particular, selecting a power function as U to provide flexible performance in statistical machine learning. 
Erscheint lt. Verlag 14.3.2022
Zusatzinfo X, 221 p. 18 illus., 15 illus. in color.
Sprache englisch
Themenwelt Mathematik / Informatik Informatik Theorie / Studium
Mathematik / Informatik Mathematik Angewandte Mathematik
Mathematik / Informatik Mathematik Statistik
Naturwissenschaften
Schlagworte Boosting • Independent Component Analysis • information geometry • Kernel Method • machine learning
ISBN-10 4-431-56922-7 / 4431569227
ISBN-13 978-4-431-56922-0 / 9784431569220
Haben Sie eine Frage zum Produkt?
PDFPDF (Wasserzeichen)
Größe: 5,1 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasser­zeichen und ist damit für Sie persona­lisiert. Bei einer missbräuch­lichen Weiter­gabe des eBooks an Dritte ist eine Rück­ver­folgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seiten­layout eignet sich die PDF besonders für Fach­bücher mit Spalten, Tabellen und Abbild­ungen. Eine PDF kann auf fast allen Geräten ange­zeigt werden, ist aber für kleine Displays (Smart­phone, eReader) nur einge­schränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich
Discover tactics to decrease churn and expand revenue

von Peter Armaly; Jeff Mar

eBook Download (2024)
Packt Publishing Limited (Verlag)
25,19
A practical guide to probabilistic modeling

von Osvaldo Martin

eBook Download (2024)
Packt Publishing Limited (Verlag)
35,99
Unleash citizen-driven innovation with the power of hackathons

von Love Dager; Carolina Emanuelson; Ann Molin; Mustafa Sherif …

eBook Download (2024)
Packt Publishing Limited (Verlag)
35,99