Partial Identification of Probability Distributions - Charles F. Manski

Partial Identification of Probability Distributions (eBook)

Charles F. Manski (Autor)

eBook Download: PDF

2006 | 1. Auflage
178 Seiten
Springer New York (Verlag)
978-0-387-21786-4 (ISBN)

Sample data alone never suffice to draw conclusions about populations. Inference always requires assumptions about the population and sampling process. Statistical theory has revealed much about how strength of assumptions affects the precision of point estimates, but has had much less to say about how it affects the identification of population parameters. Indeed, it has been commonplace to think of identification as a binary event - a parameter is either identified or not - and to view point identification as a pre-condition for inference. Yet there is enormous scope for fruitful inference using data and assumptions that partially identify population parameters. This book explains why and shows how.

The book presents in a rigorous and thorough manner the main elements of Charles Manski's research on partial identification of probability distributions. One focus is prediction with missing outcome or covariate data. Another is decomposition of finite mixtures, with application to the analysis of contaminated sampling and ecological inference. A third major focus is the analysis of treatment response. Whatever the particular subject under study, the presentation follows a common path. The author first specifies the sampling process generating the available data and asks what may be learned about population parameters using the empirical evidence alone. He then ask how the (typically) setvalued identification regions for these parameters shrink if various assumptions are imposed. The approach to inference that runs throughout the book is deliberately conservative and thoroughly nonparametric.

Conservative nonparametric analysis enables researchers to learn from the available data without imposing untenable assumptions. It enables establishment of a domain of consensus among researchers who may hold disparate beliefs about what assumptions are appropriate. Charles F. Manski is Board of Trustees Professor at Northwestern University. He is author of Identification Problems in the Social Sciences and Analog Estimation Methods in Econometrics. He is a Fellow of the American Academy of Arts and Sciences, the American Association for the Advancement of Science, and the Econometric Society.

Sample data alone never suffice to draw conclusions about populations. Inference always requires assumptions about the population and sampling process. Statistical theory has revealed much about how strength of assumptions affects the precision of point estimates, but has had much less to say about how it affects the identification of population parameters. Indeed, it has been commonplace to think of identification as a binary event - a parameter is either identified or not - and to view point identification as a precondition for inference. Yet there is enormous scope for fruitful inference using data and assumptions that partially identify population parameters. This book explains why and shows how. The book presents in a rigorous and thorough manner the main elements of Charles Manski's research on partial identification of probability distributions. One focus is prediction with missing outcome or covariate data. Another is decomposition of finite mixtures, with application to the analysis of contaminated sampling and ecological inference. A third major focus is the analysis of treatment response. Whatever the particular subject under study, the presentation follows a common path. The author first specifies the sampling process generating the available data and asks what may be learned about population parameters using the empirical evidence alone. He then ask how the (typically) setvalued identification regions for these parameters shrink if various assumptions are imposed. The approach to inference that runs throughout the book is deliberately conservative and thoroughly nonparametric.

Preface 7
Contents 8
Introduction: Partial Identification and Credible Inference 12
1 Missing Outcomes 17
1.1. Anatomy of the Problem 17
1.2. Means 19
1.3. Parameters that Respect Stochastic Dominance 22
1.4. Combining Multiple Sampling Processes 24
1.5. Interval Measurement of Outcomes 28
Complement 1A. Employment Probabilities 29
Complement 1B. Blind-Men Bounds on an Elephant 32
Endnotes 34
2 Instrumental Variables 37
2.1. Distributional Assumptions and Credible Inference 37
2.2. Some Assumptions Using Instrumental Variables 38
2.3. Outcomes Missing-at-Random 40
2.4. Statistical Independence 41
2.5. Mean Independence and Mean Monotonicity 43
2.6. Other Assumptions Using Instrumental Variables 47
Complement 2A. Estimation with Nonresponse Weights 48
Endnotes 49
3 Conditional Prediction with Missing Data 51
3.1. Prediction of Outcomes Conditional on Covariates 51
3.2. Missing Outcomes 52
3.3. Jointly Missing Outcomes and Covariates 52
3.4. Missing Covariates 57
3.5. General Missing-Data Patterns 60
3.6. Joint Inference on Conditional Distributions 64
Complement 3A. Unemployment Rates 66
Complement 3B. Parametric Prediction with Missing Data 67
Endnotes 69
4 Contaminated Outcomes 71
4.1. The Mixture Model of Data Errors 71
4.2. Outcome Distributions 73
4.3. Event Probabilities 74
4.4. Parameters that Respect Stochastic Dominance 76
Complement 4A. Contamination Through Imputation 79
Complement 4B. Identification and Robust Inference 81
Endnotes 83
5 Regressions, Short and Long 84
5.1. Ecological Inference 84
5.2. Anatomy of the Problem 85
5.3. Long Mean Regressions 87
5.4. Instrumental Variables 92
Complement 5A. Structural Prediction 95
Endnotes 96
6 Response-Based Sampling 98
6.1. Reverse Regression 98
6.2. Auxiliary Data on Outcomes or Covariates 100
6.3. The Rare-Disease Assumption 100
6.4. Bounds on Relative and Attributable Risk 102
6.5. Sampling from One Response Stratum 105
Complement 6A. Smoking and Heart Disease 108
Endnotes 109
7 Analysis of Treatment Response 110
7.1. Anatomy of the Problem 110
7.2. Treatment Choice in Heterogeneous Populations 113
7.3. The Selection Problem and Treatment Choice 116
7.4. Instrumental Variables 119
Complement 7A. Identification and Ambiguity 121
Complement 7B: Sentencing and Recidivism 123
Complement 7C. Missing Outcome and Covariate Data 125
Complement 7D. Study and Treatment Populations 128
Endnotes 129
8 Monotone Treatment Response 131
8.1. Shape Restrictions 131
8.2. Monotonicity 134
8.3. Semi-Monotonicity 138
8.4. Concave Monotonicity 143
Complement 8A: Downward-Sloping Demand 147
Complement 8B. Econometric Response Models 149
Endnotes 150
9 Monotone Instrumental Variables 152
9.1. Equalities and Inequalities 152
9.2. Mean Monotonicity 154
9.3. Mean Monotonicity and Mean Treatment Response 156
9.4. Variations on the Theme 160
Complement 9A. The Returns to Schooling 160
Endnotes 164
10 The Mixing Problem 165
10.1. Within-Group Treatment Variation 165
10.2. Known Treatment Shares 168
10.3. Extrapolation from the Experiment Alone 171
Complement 10A. Experiments Without Covariate Data 172
Endnotes 176
References 178
Index 186

Introduction: Partial Identification and Credible Inference (p. 1-2)

Statistical inference uses sample data to draw conclusions about a population of interest. However, data alone do not suffice. Inference always requires assumptions about the population and the sampling process. Statistical theory illuminates the logic of inference by showing how data and assumptions combine to yield conclusions.
Empirical researchers should be concerned with both the logic and the credibility of their inferences. Credibility is a subjective matter, yet I take there to be wide agreement on a principle that I shall call:

The Law of Decreasing Credibility: The credibility of inference decreases with the strength of the assumptions maintained.

This principle implies that empirical researchers face a dilemma as they decide what assumptions to maintain: Stronger assumptions yield inferences that may be more powerful but less credible. Statistical theory cannot resolve the dilemma but can clarify its nature.

It is useful to distinguish combinations of data and assumptions that point-identify a population parameter of interest from ones that place the parameter within a set-valued identification region. Point identification is the fundamental necessary condition for consistent point estimation of a parameter. Strengthening an assumption that achieves point identification may increase the attainable precision of estimates of the parameter. Statistical theory has had much to say about this matter. The classical theory of local asymptotic efficiency characterizes, through the Fisher information matrix, how attainable precision increases as more is assumed known about a population distribution. Nonparametric regression analysis shows how the attainable rate of convergence of estimates increases as more is assumed about the shape of the regression. These and other achievements provide important guidance to empirical researchers as they weigh the credibility and precision of alternative point estimates.

Statistical theory has had much less to say about inference on population parameters that are not point-identified (see the historical note at the end of this Introduction). It has been commonplace to think of identification as a binary event - a parameter is either identified or it is not - and to view point identification as a precondition for meaningful inference. Yet there is enormous scope for fruitful inference using data and assumptions that partially identify population parameters. This book explains why and shows how.

Origin and Organization of the Book

The book has its roots in my research on nonparametric regression analysis with missing outcome data, initiated in the late 1980s. Empirical researchers estimating regressions commonly assume that missingness is random, in the sense that the observability of an outcome is statistically independent of its value. Yet this and other point-identifying assumptions have regularly been criticized as implausible. So I set out to determine what random sampling with partial observability of outcomes reveals about mean and quantile regressions if nothing is known about the missingness process or if assumptions weak enough to be widely credible are imposed. The findings were sharp bounds whose forms vary with the regression of interest and with the maintained assumptions. These bounds can readily be estimated using standard methods of nonparametric regression analysis.

Study of regression with missing outcome data stimulated investigation of more general incomplete data problems. Some sample realizations may have unobserved outcomes, some may have unobserved covariates, and others may be entirely missing. Sometimes interval data on outcomes or covariates are available, rather than point measurements. Random sampling with incomplete observation of outcomes and covariates generically yields partial identification of regressions. The challenge is to describe and estimate the identification regions produced by incomplete-data processes when alternative assumptions are maintained. researchers estimating regressions commonly assume that missingness is random, in the sense that the observability of an outcome is statistically independent of its value. Yet this and other point-identifying assumptions have regularly been criticized as implausible. So I set out to determine what random sampling with partial observability of outcomes reveals about mean and quantile regressions if nothing is known about the missingness process or if assumptions weak enough to be widely credible are imposed. The findings were sharp bounds whose forms vary with the regression of interest and with the maintained assumptions. These bounds can readily be estimated using standard methods of nonparametric regression analysis.

Erscheint lt. Verlag	29.4.2006
Sprache	englisch
Themenwelt	Mathematik / Informatik ► Mathematik ► Angewandte Mathematik
	Mathematik / Informatik ► Mathematik ► Statistik
	Mathematik / Informatik ► Mathematik ► Wahrscheinlichkeit / Kombinatorik
	Sozialwissenschaften ► Soziologie ► Empirische Sozialforschung
	Technik
	Wirtschaft ► Volkswirtschaftslehre ► Ökonometrie
ISBN-10	0-387-21786-X / 038721786X
ISBN-13	978-0-387-21786-4 / 9780387217864

Haben Sie eine Frage zum Produkt?

PDF (Wasserzeichen)
Größe: 1,0 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Print-Ausgabe

Buch | Softcover

149,79 €