Recursive Partitioning and Applications (eBook)

eBook Download: PDF
2010 | 2nd ed. 2010
XIV, 262 Seiten
Springer New York (Verlag)
978-1-4419-6824-1 (ISBN)

Lese- und Medienproben

Recursive Partitioning and Applications - Heping Zhang, Burton H. Singer
Systemvoraussetzungen
106,99 inkl. MwSt
  • Download sofort lieferbar
  • Zahlungsarten anzeigen
Multiple complex pathways, characterized by interrelated events and c- ditions, represent routes to many illnesses, diseases, and ultimately death. Although there are substantial data and plausibility arguments suppo- ing many conditions as contributory components of pathways to illness and disease end points, we have, historically, lacked an e?ective method- ogy for identifying the structure of the full pathways. Regression methods, with strong linearity assumptions and data-basedconstraints onthe extent and order of interaction terms, have traditionally been the strategies of choice for relating outcomes to potentially complex explanatory pathways. However, nonlinear relationships among candidate explanatory variables are a generic feature that must be dealt with in any characterization of how health outcomes come about. It is noteworthy that similar challenges arise from data analyses in Economics, Finance, Engineering, etc. Thus, the purpose of this book is to demonstrate the e?ectiveness of a relatively recently developed methodology-recursive partitioning-as a response to this challenge. We also compare and contrast what is learned via rec- sive partitioning with results obtained on the same data sets using more traditional methods. This serves to highlight exactly where-and for what kinds of questions-recursive partitioning-based strategies have a decisive advantage over classical regression techniques.

Heping Zhang is Professor of Public Health, Statistics, and Child Study, and director of the Collaborative Center for Statistics in Science, at Yale University. He is a Fellow of the American Statistical Association and the Institute of Mathematical Statistics, a Myrto Lefkopoulou Distinguished Lecturer Awarded by Harvard School of Public Health, and a Medallion lecturer selected by the Institute of Mathematical Statistics. Burton Singer is Courtesy Professor in the Emerging Pathogens Institute at University of Florida, and previously Charles and Marie Robertson Professor of Public and International Affairs at Princeton University. He is a member of the National Academy of Sciences and Institute of Medicine of the National Academies, and a Fellow of the American Statistical Association.
Multiple complex pathways, characterized by interrelated events and c- ditions, represent routes to many illnesses, diseases, and ultimately death. Although there are substantial data and plausibility arguments suppo- ing many conditions as contributory components of pathways to illness and disease end points, we have, historically, lacked an e?ective method- ogy for identifying the structure of the full pathways. Regression methods, with strong linearity assumptions and data-basedconstraints onthe extent and order of interaction terms, have traditionally been the strategies of choice for relating outcomes to potentially complex explanatory pathways. However, nonlinear relationships among candidate explanatory variables are a generic feature that must be dealt with in any characterization of how health outcomes come about. It is noteworthy that similar challenges arise from data analyses in Economics, Finance, Engineering, etc. Thus, the purpose of this book is to demonstrate the e?ectiveness of a relatively recently developed methodology-recursive partitioning-as a response to this challenge. We also compare and contrast what is learned via rec- sive partitioning with results obtained on the same data sets using more traditional methods. This serves to highlight exactly where-and for what kinds of questions-recursive partitioning-based strategies have a decisive advantage over classical regression techniques.

Heping Zhang is Professor of Public Health, Statistics, and Child Study, and director of the Collaborative Center for Statistics in Science, at Yale University. He is a Fellow of the American Statistical Association and the Institute of Mathematical Statistics, a Myrto Lefkopoulou Distinguished Lecturer Awarded by Harvard School of Public Health, and a Medallion lecturer selected by the Institute of Mathematical Statistics. Burton Singer is Courtesy Professor in the Emerging Pathogens Institute at University of Florida, and previously Charles and Marie Robertson Professor of Public and International Affairs at Princeton University. He is a member of the National Academy of Sciences and Institute of Medicine of the National Academies, and a Fellow of the American Statistical Association.

Preface 8
Contents 12
1 Introduction 16
1.1 Examples Using CART 18
1.2 The Statistical Problem 21
1.3 Outline of the Methodology 22
2 A Practical Guide to TreeConstruction 24
2.1 The Elements of Tree Construction 26
2.2 Splitting a Node 27
2.3 Terminal Nodes 34
2.4 Download and Use of Software 35
3 Logistic Regression 38
3.1 Logistic Regression Models 38
3.2 A Logistic Regression Analysis 39
4 Classification Trees for a BinaryResponse 45
4.1 Node Impurity 45
4.2 Determination of Terminal Nodes 48
4.2.1 Misclassification Cost 48
4.2.2 Cost–Complexity 51
4.2.3 Nested Optimal Subtrees* 53
4.3 The Standard Error of Rcv* 56
4.4 Tree-Based Analysis of the Yale Pregnancy Outcome Study 57
4.5 An Alternative Pruning Approach 58
4.6 Localized Cross-Validation 63
4.7 Comparison Between Tree-Based and Logistic Regression Analyses 65
4.8 Missing Data 68
4.8.1 Missings Together Approach 69
4.8.2 Surrogate Splits 70
4.9 Tree Stability 71
4.10 Tree for Treatment Effectiveness 72
4.11 Implementation* 73
5 Examples Using Tree-Based Analysis 77
5.1 Risk-Factor Analysis in Epidemiology 77
5.1.1 Background 77
5.1.2 The Analysis 79
5.2 Customer Credit Assessment 85
6 Random and Deterministic Forests 92
6.1 Introduction to Random Forests 92
6.2 The Smallest Forest 94
6.3 Importance Score 97
6.3.1 Gini Importance 97
6.3.2 Depth Importance 97
6.3.3 Permutation Importance 97
6.3.4 Maximum Conditional Importance 99
6.4 Random Forests for Predictors with Uncertainties 102
6.5 Random Forests with Weighted Feature Selection 106
6.6 Deterministic Forests 106
6.7 A Note on Interaction 107
7 Analysis of Censored Data: Examples 109
7.1 Introduction 109
7.2 Tree-Based Analysis for the Western Collaborative Group Study Data 112
8 Analysis of Censored Data: Conceptsand Classical Methods 116
8.1 The Basics of Survival Analysis 116
8.1.1 Kaplan–Meier Curve 119
8.1.2 Log-Rank Test 121
8.2 Parametric Regression for Censored Data 116
8.2.1 Linear Regression with Censored Data* 124
8.2.2 Cox Proportional Hazard Regression 126
8.2.3 Reanalysis of the Western Collaborative Group Study Data 127
9 Analysis of Censored Data: Survival Trees and Random Forests 130
9.1 Splitting Criteria 130
9.1.1 Gordon and Olshen’s Rule* 130
9.1.2 Maximizing the Difference 133
9.1.3 Use of Likelihood Functions* 133
9.1.4 A Straightforward Extension 136
9.2 Pruning a Survival Tree 137
9.3 Random Survival Forests 138
9.4 Implementation 138
9.5 Survival Trees for the Western Collaborative Group Study Data 139
9.6 Combinations of Biomarkers Predictive of Later Life Mortality 140
10 Regression Trees and Adaptive Splinesfor a Continuous Response 143
10.1 Tree Representation of Spline Model and Analysis of Birth Weight 144
10.2 Regression Trees 146
10.3 The Profile of MARS Models 150
10.4 Modified MARS Forward Procedure 153
10.5 MARS Backward-Deletion Step 156
10.6 The Best Knot* 158
10.7 Restrictions on the Knot* 161
10.7.1 Minimum Span 161
10.7.2 Maximal Correlation 162
10.7.3 Patches to the MARS Forward Algorithm 164
10.8 Smoothing Adaptive Splines* 165
10.8.1 Smoothing the Linearly Truncated Basis Functions 166
10.8.2 Cubic Basis Functions 166
10.9 Numerical Examples 167
11 Analysis of Longitudinal Data 173
11.1 Infant Growth Curves 173
11.2 The Notation and a General Model 175
11.3 Mixed-Effects Models 176
11.4 Semiparametric Models 179
11.5 Adaptive Spline Models 180
11.5.1 Known Covariance Structure 181
11.5.2 Unknown Covariance Structure 182
11.5.3 A Simulated Example 185
11.5.4 Reanalyses of Two Published Data Sets 188
11.5.5 Analysis of Infant Growth Curves 197
11.5.6 Remarks 202
11.6 Regression Trees for Longitudinal Data 203
11.6.1 Example: HIV in San Francisco 205
12 Analysis of Multiple Discrete Responses 209
12.1 Parametric Methods for Binary Responses 211
12.1.1 Log-Linear Models 212
12.1.2 Marginal Models 213
12.1.3 Parameter Estimation* 215
12.1.4 Frailty Models 216
12.2 Classification Trees for Multiple Binary Responses 219
12.2.1 Within-Node Homogeneity 219
12.2.2 Terminal Nodes 220
12.2.3 Computational Issues* 221
12.2.4 Parameter Interpretation* 222
12.3 Application: Analysis of BROCS Data 223
12.3.1 Background 223
12.3.2 Tree Construction 223
12.3.3 Description of Numerical Results 225
12.3.4 Alternative Approaches 228
12.3.5 Predictive Performance 229
12.4 Ordinal and Longitudinal Responses 229
12.5 Analysis of the BROCS Data via Log-Linear Models 231
13 Appendix 236
13.1 The Script for Running RTREE Automatically 236
13.2 The Script for Running RTREE Manually 238
13.3 The .inf File 242
References 245
Index 264

Erscheint lt. Verlag 1.7.2010
Reihe/Serie Springer Series in Statistics
Springer Series in Statistics
Zusatzinfo XIV, 262 p.
Verlagsort New York
Sprache englisch
Original-Titel Recursive Partitioning in the Health Sciences
Themenwelt Mathematik / Informatik Informatik
Mathematik / Informatik Mathematik Statistik
Mathematik / Informatik Mathematik Wahrscheinlichkeit / Kombinatorik
Medizin / Pharmazie Allgemeines / Lexika
Technik Medizintechnik
Schlagworte adIOMEDICIaptive Splines and Regression Trees • Calculus • classification • epidemiology • Logistic Regression • Practical Computational Methods • Recursive Partitioning • Tree-based Survival Analysis • Trees and Associated Forests
ISBN-10 1-4419-6824-5 / 1441968245
ISBN-13 978-1-4419-6824-1 / 9781441968241
Haben Sie eine Frage zum Produkt?
PDFPDF (Wasserzeichen)
Größe: 3,4 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasser­zeichen und ist damit für Sie persona­lisiert. Bei einer missbräuch­lichen Weiter­gabe des eBooks an Dritte ist eine Rück­ver­folgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seiten­layout eignet sich die PDF besonders für Fach­bücher mit Spalten, Tabellen und Abbild­ungen. Eine PDF kann auf fast allen Geräten ange­zeigt werden, ist aber für kleine Displays (Smart­phone, eReader) nur einge­schränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich