Comparing Distributions (eBook)
XVI, 354 Seiten
Springer New York (Verlag)
978-0-387-92710-7 (ISBN)
Provides a self-contained comprehensive treatment of both one-sample and K-sample goodness-of-fit methods by linking them to a common theory backbone
Contains many data examples, including R-code and a specific R-package for comparing distributions
Emphesises informative statistical analysis rather than plain statistical hypothesis testing
Comparing Distributions refers to the statistical data analysis that encompasses the traditional goodness-of-fit testing. Whereas the latter includes only formal statistical hypothesis tests for the one-sample and the K-sample problems, this book presents a more general and informative treatment by also considering graphical and estimation methods. A procedure is said to be informative when it provides information on the reason for rejecting the null hypothesis. Despite the historically seemingly different development of methods, this book emphasises the similarities between the methods by linking them to a common theory backbone. This book consists of two parts. In the first part statistical methods for the one-sample problem are discussed. The second part of the book treats the K-sample problem. Many sections of this second part of the book may be of interest to every statistician who is involved in comparative studies.The book gives a self-contained theoretical treatment of a wide range of goodness-of-fit methods, including graphical methods, hypothesis tests, model selection and density estimation. It relies on parametric, semiparametric and nonparametric theory, which is kept at an intermediate level; the intuition and heuristics behind the methods are usually provided as well. The book contains many data examples that are analysed with the cd R-package that is written by the author. All examples include the R-code. Because many methods described in this book belong to the basic toolbox of almost every statistician, the book should be of interest to a wide audience. In particular, the book may be useful for researchers, graduate students and PhD students who need a starting point for doing research in the area of goodness-of-fit testing. Practitioners and applied statisticians may also be interested because of the many examples, the R-code and the stress on the informative nature of the procedures.
Preface 6
Contents 11
Part I One-Sample Problems 17
1 Introduction 18
1.1 The History of the One-Sample GOF Problem 18
1.2 Example Datasets 19
1.2.1 Pseudo-Random Generator Data 19
1.2.2 PCB Concentration Data 20
1.2.3 Pulse Rate Data 20
1.2.4 Cultivars Data 21
1.3 The Pearson Chi-Squared Test 23
1.3.1 Pearson Chi-Squared Test for the Multinomial Distribution 23
1.3.1.1 The Simple Null Hypothesis Case 23
1.3.1.2 The Composite Null Hypothesis Case 25
1.3.2 Generalisations of the Pearson 2 Test 28
1.3.3 A Note on the Nuisance Parameter Estimation 29
1.4 Pearson X2 Tests for Continuous Distributions 30
2 Preliminaries (Building Blocks) 33
2.1 The Empirical Distribution Function 33
2.1.1 Definition and Construction 33
2.1.2 Rationale for Using the EDF 35
2.2 Empirical Processes 36
2.2.1 Definition 36
2.2.2 Weak Convergence 37
2.2.3 Kac--Siegert Decomposition of Gausian Processes 38
2.3 The Quantile Function and the Quantile Process 41
2.3.1 The Quantile Function and Its Estimator 41
2.3.2 The Quantile Process 42
2.4 Comparison Distribution 43
2.5 Hilbert Spaces 44
2.6 Orthonormal Functions 47
2.6.1 The Fourier Basis 47
2.6.2 Orthonormal Polynomials 47
2.7 Parameter Estimation 48
2.7.1 Locally Asymptotically Linear Estimators 48
2.7.2 Method of Moments Estimators 49
2.7.3 Efficiency and Semiparametric Inference 50
2.8 Nonparametric Density Estimation 51
2.8.1 Introduction 51
2.8.2 Orthogonal Series Estimators 53
2.8.3 Kernel Density Estimation 56
2.8.4 Regression-Based Density Estimation 56
2.9 Hypothesis Testing 56
2.9.1 General Construction of a Hypothesis Test 57
2.9.2 Optimality Criteria 58
2.9.2.1 Finite Sample Criteria 58
2.9.2.2 Asymptotic Criteria 59
2.9.3 The Neyman--Pearson Lemma 61
3 Graphical Tools 62
3.1 Histograms and Box Plots 62
3.1.1 The Histogram 62
3.1.1.1 The Construction 62
3.1.1.2 Some Properties 63
3.1.1.3 Regression-Based Density Estimation 65
3.1.2 The Box Plot 65
3.2 Probability Plots and Comparison Distribution 69
3.2.1 Population Probability Plots 69
3.2.2 PP and QQ plots 70
3.3 Comparison Distribution 75
3.3.1 Population Comparison Distributions 75
3.3.1.1 Definition and Interpretation 75
3.3.1.2 Decomposition of the Comparison Density 76
3.3.2 Empirical Comparison Distributions 81
3.3.2.1 Estimators of the Comparison Density 81
3.3.2.2 Confidence Intervals of the Comparison Density 82
3.3.3 Comparison Distribution for Discrete Data 86
4 Smooth Tests 89
4.1 Smooth Models 89
4.1.1 Construction of the Smooth Model 89
4.2 Smooth Tests 94
4.2.1 Simple Null Hypotheses 94
4.2.1.1 Test Statistics and Null Distributions 94
4.2.1.2 Interpretation of Components 95
4.2.1.3 Interpretation of Components when Orthonormal Polynomials Are Used 96
4.2.2 Composite Null Hypotheses 100
4.2.2.1 Maximum Likelihood and Method of Moments Estimators 100
4.2.2.2 The Efficient Score Test 102
4.2.2.3 The Generalised Score Test 104
4.3 Adaptive Smooth Tests 107
4.3.1 Consistency, Dilution Effects and Order Selection 107
4.3.2 Order Selection Within a Finite Horizon 110
4.3.3 Order Selection Within an Infinite Horizon 114
4.3.4 Subset Selection Within a Finite Horizon 115
4.3.5 Improved Density Estimates 119
4.4 Smooth Tests for Discrete Distributions 120
4.4.1 Introduction 120
4.4.2 The Simple Null Hypothesis Case 120
4.4.3 The Composite Null Hypothesis Case 121
4.5 A Semiparametric Framework 123
4.5.1 The Semiparametric Hypotheses 123
4.5.2 Semiparametric Tests 124
4.5.3 A Distance Function 126
4.5.4 Interpretation and Estimation of the Nuisance Parameter 126
4.5.5 The Quadratic Inference Function 127
4.5.6 Relation with the Empirically Rescaled Smooth Tests 128
4.6 Example 129
4.7 Some Practical Guidelines for Smooth Tests 133
5 Methods Based on the Empirical Distribution Function 135
5.1 The Kolmogorov--Smirnov Test 135
5.1.1 Definition 135
5.1.2 Null Distribution 137
5.1.3 Presence of Nuisance Parameters 139
5.2 Tests as Integrals of Empirical Processes 141
5.2.1 The Anderson--Darling Statistics 141
5.2.2 Principal Components Decomposition of the Test Statistic 142
5.2.2.1 Principal Components Decomposition of the Cramér--von Mises Statistic (Simple Null) 143
5.2.2.2 Principal Components Decomposition of the Anderson--Darling Statistic (Simple Null) 145
5.2.2.3 Principal Components Decompositions for Composite Null Hypotheses 146
5.2.3 Null Distribution 149
5.2.4 The Watson Test 154
5.2.4.1 The Test Statistic 154
5.2.4.2 Principal Components Decomposition of the Watson Statistic (Simple Null) 155
5.2.4.3 Null Distribution (Simple Null) 156
5.3 Generalisations of EDF Tests 156
5.3.1 Tests Based on the Empirical Quantile Function(EQF) 157
5.3.1.1 The Empirical Quantile Function 157
5.3.1.2 EQF Tests for the Simple Null Hypothesis 158
5.3.1.3 EQF Tests for Location-Scale Distributions 160
5.3.2 Tests Based on the Empirical Characteristic Function (ECF) 163
5.3.3 Miscellaneous Tests Based on Empirical Functionals of F 165
5.4 The Sample Space Partition Tests 167
5.4.1 Another Look at the Anderson--Darling Statistic 167
5.4.2 The Sample Space Partition Test 167
5.5 Some Further Bibliographic Notes 170
5.6 Some Practical Guidelines for EDF Tests 171
Part II Two-Sample and K-Sample Problems 173
6 Introduction 174
6.1 The Problem Defined 175
6.1.1 The Null Hypothesis of the General Two-Sample Problem 175
6.1.2 The Null Hypothesis of the General K-SampleProblem 176
6.2 Example Datasets 177
6.2.1 Gene Expression in Colorectal Cancer Patients 177
6.2.2 Travel Times 178
7 Preliminaries (Building Blocks) 181
7.1 Permutation Tests 181
7.1.1 Introduction by Example 181
7.1.2 Some Permutation and Randomisation Test Theory 185
7.1.2.1 Definitions 185
7.1.2.2 Construction of the Permutation Test 186
7.1.2.3 Monte Carlo Approximation to the Exact Permutation Null Distribution 187
7.2 Linear Rank Tests 189
7.2.1 Simple Linear Rank Statistics 189
7.2.1.1 Ranks and Order Statistics 189
7.2.1.2 Simple Linear Rank Statistics 192
7.2.1.3 Score Generating Functions 194
7.2.1.4 The Rank Score Process 195
7.2.2 Locally Most Powerful Linear Rank Tests 197
7.2.2.1 Locally Most Powerful Linear Rank Tests for General Alternatives 197
7.2.3 Adaptive Linear Rank Tests 200
7.3 The Pooled Empirical Distribution Function 200
7.4 The Comparison Distribution 201
7.5 The Quantile Process 202
7.5.1 Contrast Processes 202
7.5.2 Comparison Distribution Processes 204
7.5.2.1 Construction 204
7.5.2.2 Weak Convergence 205
7.6 Stochastic Ordering and Related Properties 206
8 Graphical Tools 210
8.1 PP and QQ Plots 210
8.1.1 Population Plots 210
8.1.1.1 Population QQ Plot 210
8.1.1.2 Population PP Plot 212
8.1.2 Empirical PP and QQ Plots 214
8.1.2.1 Construction 214
8.1.2.2 Sample Size Issues 215
8.1.2.3 When to Use Which Plot 218
8.2 Comparisons Distributions 222
8.2.1 The Population Comparison Distribution 222
8.2.2 The Empirical Comparison Distribution 222
9 Some Important Two-Sample Tests 229
9.1 The Relation Between Statistical Tests and Hypotheses 230
9.1.1 Introduction 230
9.2 The Wilcoxon Rank Sum and the Mann--Whitney Tests 233
9.2.1 Introduction 233
9.2.2 The Hypotheses 234
9.2.3 The Test Statistics 235
9.2.4 The Null Distribution 236
9.2.5 The WMW Test as a LMPRT 238
9.2.6 The MW Statistic as an Estimator of 240
9.2.7 The Hodges--Lehmann Estimator 242
9.2.8 Examples 242
9.3 The Diagnostic Property of Two-Sample Tests 251
9.3.1 The Semiparametric Framework 252
9.3.2 Natural and Implied Null Hypotheses 254
9.3.3 The WMW Test in the Semiparametric Framework 254
9.3.3.1 Implied Null Hypothesis 255
9.3.3.2 Null Distributions 255
9.3.4 Empirical Variance Estimators of Simple Linear Rank Statistics 258
9.3.4.1 The Asymptotic Variance of a Simple Linear Rank Statistic 258
9.3.4.2 The Jackknife Estimator of the Asymptotic Variance 260
9.4 Optimal Linear Rank Tests for Normal Location-ShiftModels 261
9.5 Rank Tests for Scale Differences 262
9.5.1 The Scale-Difference Model 263
9.5.2 The Capon and Klotz Tests 264
9.5.3 Some Other Important Tests 265
9.5.3.1 Measures for Differences in Scale 265
9.5.3.2 The Ansari--Bradley Test 267
9.5.3.3 The Shukatme Test 269
9.5.3.4 The Mood Test 270
9.5.3.5 The Lehmann Test 272
9.5.3.6 The Fligner--Killeen Test 272
9.5.4 Conclusion 273
9.6 The Kruskal--Wallis Test and the ANOVA F-Test 273
9.6.1 The Hypotheses and the Test Statistic 274
9.6.2 The Null Distribution 275
9.6.3 The Diagnostic Property 275
9.6.4 The F-Test in ANOVA 276
9.7 Some Final Remarks 277
9.7.1 Adaptive Tests 277
9.7.2 The Lepage Test 278
10 Smooth Tests 279
10.1 Smooth Tests for the 2-Sample Problem 279
10.1.1 Smooth Models and the Smooth Test 279
10.1.1.1 Smooth Models 279
10.1.1.2 Smooth Test Statistic and the Null Distribution 282
10.1.2 Components 283
10.1.2.1 The First Component: WMW Statistic 284
10.1.2.2 The Second Component: Mood Statistic 284
10.1.2.3 The Third Component: the SKEW Statistic 285
10.1.2.4 The Fourth Component: the KURT Statistic 286
10.2 The Diagnostic Property 286
10.2.1 Examples 287
10.3 Smooth Tests for the K-Sample Problem 290
10.3.1 Smooth Models and the Smooth Test 290
10.3.2 Components 294
10.4 Adaptive Smooth Tests 296
10.4.1 Order Selection and Subset Selection with a Finite Horizon 296
10.4.2 Order Selection with an Infinite Horizon 297
10.5 Examples 298
10.6 Smooth Tests That Are Not Based on Ranks 302
10.7 Some Practical Guidelines for Smooth Tests 303
11 Methods Based on the Empirical Distribution Function 305
11.1 The Two-Sample and K-Sample Kolmogorov--Smirnov Test 305
11.1.1 The Kolmogorov--Smirnov Test for the Two-Sample Problem 305
11.1.1.1 The Test Statistic 305
11.1.1.2 The Null Distribution 306
11.1.2 The Kolmogorov--Smirnov Test for the K-Sample Problem 307
11.2 Tests of the Anderson--Darling Type 307
11.2.1 The Test Statistic 307
11.2.2 The Components 309
11.2.3 The Null Distribution 311
11.2.4 Examples 312
11.3 Adaptive Tests of Neuhaus 314
11.3.1 The General Idea 314
11.3.2 Smooth Tests 316
11.3.3 EDF tests 316
11.4 Some Practical Guidelines for EDF Tests 317
12 Two Final Methods and Some Final Thoughts 319
12.1 A Contigency Table Approach 319
12.2 The Sample Space Partition Tests 321
12.3 Some Final Thoughts and Conclusions 323
A Proofs 328
A.1 Proof of Theorem 1.1 328
A.2 Proof of Theorem 1.2 329
A.3 Proof of Theorem 4.1 330
A.4 Proof of Lemma 4.1 331
A.5 Proof of Lemma 4.2 332
A.6 Proof of Lemma 4.3 332
A.7 Proof of Theorem 4.10 333
A.8 Proof of Theorem 4.2 333
A.9 Heuristic Proof of Theorem 5.2 338
A.10 Proof of Theorem 9.1 339
B The Bootstrap and Other Simulation Techniques 341
B.1 Simulation of EDF Statistics Under the Simple Null Hypothesis 341
B.2 The Parametric Bootstrap for Composite Null Hypotheses 342
B.3 A Modified Nonparametric Bootstrap for Testing Semiparametric Null Hypotheses 342
References 344
Index 355
Erscheint lt. Verlag | 14.3.2010 |
---|---|
Reihe/Serie | Springer Series in Statistics | Springer Series in Statistics |
Zusatzinfo | XVI, 354 p. |
Verlagsort | New York |
Sprache | englisch |
Themenwelt | Mathematik / Informatik ► Informatik ► Datenbanken |
Mathematik / Informatik ► Mathematik ► Statistik | |
Mathematik / Informatik ► Mathematik ► Wahrscheinlichkeit / Kombinatorik | |
Sozialwissenschaften ► Politik / Verwaltung | |
Technik | |
Wirtschaft ► Betriebswirtschaft / Management ► Planung / Organisation | |
Schlagworte | Data Analysis • goodness-of-fit tests • graphical methods • Nonparametric Statistics • rank tests • semiparametic statistics • statistical method |
ISBN-10 | 0-387-92710-7 / 0387927107 |
ISBN-13 | 978-0-387-92710-7 / 9780387927107 |
Haben Sie eine Frage zum Produkt? |
Größe: 3,2 MB
DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.
Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich