Regression Analysis for Social Sciences (eBook)
386 Seiten
Elsevier Science (Verlag)
978-0-08-055082-4 (ISBN)
Key Features
* Presents accessible methods of regression analysis
* Includes a broad spectrum of methods
* Techniques are explained step-by-step
* Provides sample command and result files for SYSTAT
Regression Analysis for Social Sciences presents methods of regression analysis in an accessible way, with each method having illustrations and examples. A broad spectrum of methods are included: multiple categorical predictors, methods for curvilinear regression, and methods for symmetric regression. This book can be used for courses in regression analysis at the advanced undergraduate and beginning graduate level in the social and behavioral sciences. Most of the techniques are explained step-by-step enabling students and researchers to analyze their own data. Examples include data from the social and behavioral sciences as well as biology, making the book useful for readers with biological and biometrical backgrounds. Sample command and result files for SYSTAT are included in the text. - Presents accessible methods of regression analysis- Includes a broad spectrum of methods- Techniques are explained step-by-step- Provides sample command and result files for SYSTAT
Front Cover 1
Regression Analysis for Social Sciences 4
Copyright Page 5
Contents 6
Preface 12
CHAPTER 1. INTRODUCTION 18
CHAPTER 2. SIMPLE LINEAR REGRESSION 24
2.1 Linear Functions and Estimation 24
2.2 Parameter Estimation 29
2.3 Interpreting Regression Parameters 43
2.4 Interpolation and Extrapolation 45
2.5 Testing Regression Hypotheses 46
CHAPTER 3. MULTIPLE LINEAR REGRESSION 60
3.1 Ordinary Least Squares Estimation 61
3.2 Data Example 67
3.3 Multiple Correlation and Determination 70
3.4 Significance Testing 75
CHAPTER 4. CATEGORICAL PREDICTORS 80
4.1 Dummy and Effect Coding 82
4.2 More Than Two Categories 87
4.3 Multiple Categorical Predictors 94
CHAPTER 5. OUTLIER ANALYSIS 98
5.1 Leverage Outliers 98
5.2 Remedial Measures 106
CHAPTER 6. RESIDUAL ANALYSIS 116
6.1 Illustrations of Residual Analysis 117
6.2 Residuals and Variable Relationships 123
CHAPTER 7. POLYNOMIAL REGRESSION 134
7.1 Basics 134
7.2 Orthogonal Polynomials 141
7.3 Example of Non-Equidistant Predictors 145
CHAPTER 8. MULTICOLLINEARITY 150
8.1 Diagnosing Multicollinearity 153
8.2 Countermeasures to Multicollinearity 155
CHAPTER 9. MULTIPLE CURVILINEAR REGRESSION 160
CHAPTER 10. INTERACTION TERMS IN REGRESSION 168
10.1 Definition and Illustrations 168
10.2 Multiplicative Terms 171
10.3 Variable Characteristics 179
CHAPTER 11. ROBUST REGRESSION 192
11.1 The Concept of Robustness 192
11.2 Models of Robust Regression 195
11.3 Computational Issues 208
CHAPTER 12. SYMMETRIC REGRESSION 226
12.1 Pearson’s Orthogonal Regression 227
12.2 Other Solutions 236
12.3 A General Model for OLS Regression 242
12.4 Robust Symmetrical Regression 247
12.5 Computational Issues 247
CHAPTER 13. VARIABLE SELECTION TECHNIQUES 254
13.1 A Data Example 257
13.2 Best Subset Regression 261
13.3 Stepwise Regression 268
13.4 Discussion 274
CHAPTER 14. REGRESSION FOR LONGITUDINAL DATA 276
14.1 Within Subject Correlation 277
14.2 Robust Modeling of Longitudinal Data 283
14.3 A Data Example 287
CHAPTER 15. PIECEWISE REGRESSION 294
15.1 Continuous Piecewise Regression 295
15.2 Discontinuous Piecewise Regression 298
CHAPTER 16. DICHOTOMOUS CRITERION VARIABLES 304
CHAPTER 17. COMPUTATIONAL ISSUES 308
17.1 Creating a SYSTAT System File 308
17.2 Simple Regression 312
17.3 Curvilinear Regression 315
17.4 Multiple Regression 321
17.5 Regression Interaction 325
17.6 Regression with Categorical Predictors 327
17.7 The Partial Interaction Strategy 332
17.8 Residual Analysis 336
17.9 Missing Data Estimation 340
17.10 Piecewise Regression 345
APPENDIX A. ELEMENTS OF MATRIX ALGEBRA 350
A.1 Definition of a Matrix 350
A.2 Types of Matrices 352
A.3 Transposing Matrices 354
A.4 Adding Matrices 354
A.5 Multiplying Matrices 355
A.6 The Rank of a Matrix 359
A.7 The Inverse of a Matrix 361
A.8 The Determinant of a Matrix 363
A.9 Rules for Operations with Matrices 364
A.10 Exercises 366
APPENDIX B. BASICS OF DIFFERENTIATION 368
APPENDIX C. BASICS OF VECTOR DIFFERENTIATION 372
APPENDIX D. POLYNOMIALS 376
D.1 Systems of Orthogonal Polynomials 378
D.2 Smoothing Series of Measures 380
APPENDIX E. DATA SETS 382
E.1 Recall Performance Data 382
E.2 Examination and State Anxiety Data 387
References 390
Index 398
Introduction
Regression analysis is one of the most widely used statistical techniques. Today, regression analysis is applied in the social sciences, medical research, economics, agriculture, biology, meteorology, and many other areas of academic and applied science. Reasons for the outstanding role that regression analysis plays include that its concepts are easily understood, and it is implemented in virtually every all-purpose statistical computing package, and can therefore be readily applied to the data at hand. Moreover, regression analysis lies at the heart of a wide range of more recently developed statistical techniques such as the class of generalized linear models (McCullagh & Nelder, 1989; Dobson, 1990). Hence a sound understanding of regression analysis is fundamental to developing one’s understanding of modern applied statistics.
Regression analysis is designed for situations where there is one continuously varying variable, for example, sales profit, yield in a field experiment, or IQ. This continuous variable is commonly denoted by Y and termed the dependent variable, that is, the variable that we would like to explain or predict. For this purpose, we use one or more other variables, usually denoted by X1, X2, …, the independent variables, that are related to the variable of interest.
To simplify matters, we first consider the situation where we are only interested in a single independent variable. To exploit the information that the independent variable carries about the dependent variable, we try to find a mathematical function that is a good description of the assumed relation. Of course, we do not expect the function to describe the dependent variable perfectly, as in statistics we always allow for randomness in the data, that is, some sort of variability, sometimes referred to as error, that on the one hand is too large to be neglected but, on the other hand, is only a nuisance inherent in the phenomenon under study.
To exemplify the ideas we present, in Figure 1.1, a scatterplot of data that was collected in a study by Finkelstein, von Eye, and Preece (1994). One goal of the study was to relate the self-reported number of aggressive impulses to the number of self-reported incidences of physical aggression in adolescents. The sample included n = 106 respondents, each providing the pair of values X, that is, Aggressive Impulses, and Y, that is, open Physical Aggression against Peers. In shorthand notation, (Xi, Yi), i = 1, …, 106.
While it might be reasonable to assume a relation between Aggressive Impulses and Physical Aggression against Peers, scientific practice involves demonstrating this assumed link between the two variables using data from experiments or observational studies. Regression analysis is one important tool for this task.
However, regression analysis is not only suited to suggesting decisions as to whether or not a relationship between two variables exists. Regression analysis goes beyond this decision making and provides a different type of precise statement. As we already mentioned above, regression analysis specifies a functional form for the relationship between the variables under study that allows one to estimate the degree of change in the dependent variable that goes hand in hand with changes in the independent variable. At the same time, regression analysis allows one to make statements about how certain one can be about the predicted change in Y that is associated with the observed change in X.
To see how the technique works we look at the data presented in the scatterplot of Figure 1.1. On purely intuitive grounds, simply by looking at the data, we can try to make statements similar to the ones that are addressed by regression analysis.
First of all, we can ask whether there is a relationship at all between the number of aggressive impulses and the number of incidences of physical aggression against peers. The scatterplot shows a very wide scatter of the points in the plot. This could be caused by imprecise measurement or a naturally high variability of responses concerning aggression. Nevertheless, there seems to be a slight trend in the data, confirming the obvious hypothesis that more aggressive impulses lead to more physical aggression. Since the scatter of the points is so wide, it is quite hard to make very elaborate statements about the supposed functional form of this relation. The assumption of a linear relation between the variables under study, indicated by the straight line, and a positive trend in the data seems, for the time being, sufficiently elaborate to characterize the characteristics of the data.
Every linear relationship can be written in the form Y = βX + α. Therefore, specifying this linear relation is equivalent to finding reasonable estimates for β and α. Every straight line or, equivalently, every linear function is determined by two points in a plane through which the line passes. Therefore, we expect to obtain estimates of β and α if we can only find these two points in the plane. This could be done in the following way. We select a value on the scale of the independent variable, X, Aggressive Impulses in the example, and select all pairs of values that have a score on the independent variable that is close to this value. Now, a natural predictor for the value of the dependent variable, Y, Physical Aggression against Peers, that is representative for these observations is the mean of the dependent variable of these values. For example, when looking up in the scatterplot those points that have a value close to 10 on the Aggressive Impulse scale, the mean of the associated values on the physical aggression scale is near 15. Similarly, if we look at the points with a value close to 20 on the Aggressive Impulse scale, we find that the mean of the values of the associated Physical Aggression scale is located slightly above 20. So let us take 22 as our guess.
Now, we are ready to obtain estimates of β and α. It is a simple exercise to transform the coordinates of our hypothetical regression line, that is, (10, 15) and (20, 22), into estimates of β and α. One obtains as the estimate for β a value of 0.7 and as an estimate for α a value of 8. If we insert these values into the equation, Y = βX + α, and set X = 10 we obtain for Y a value of 15, which is just the corresponding value of Y from which we started. This can be done for the second point, (20, 22), as well.
As we have already mentioned, the scatter of the points is very wide and if we use our estimates for β and α to predict physical aggression for, say, a value of 15 or 30 on the Aggressive Impulse scale, we do not expect it to be very accurate. It should be noted that this lack of accuracy is not caused by our admittedly very imprecise eyeballing method.
Of course, we do not advocate using this method in general. Perhaps the most obvious point that can be criticized about this procedure is that if another person is asked to specify a regression line from eyeballing, he or she will probably come to a slightly different set of estimates for α and β. Hence, the conclusion drawn from the line would be slightly different as well. So it is natural to ask whether there is a generally agreed-upon procedure for obtaining the parameters of the regression line, or simply the regression parameters. This is the case. We shall see that the regression parameters can be estimated optimally by the method of ordinary least squares given that some assumptions are met about the population the data were drawn from. This procedure will be formally introduced in the next chapters. If this method is applied to the data in Figure 1.1, the parameter estimates turn out to be 0.6 for β and 11 for α. When we compare these estimates to the ones above, we see that our intuitive method yields estimates that are not too different from the least squares estimates calculated by the computer.
Regardless of the assumed functional form, obtaining parameter estimates is one of the important steps in regression analysis. But as estimates are obtained from data that are to a certain extent random, these estimates are random as well. If we imagine a replication of the study, we would certainly not expect to obtain exactly the same parameter estimates again. They will differ more or less from the estimates of the first study. Therefore, a decision is needed as to whether the results are merely due to chance. In other words, we have to deal with the question of how likely it would be that we will not get the present positive trend in a replication study. It will be seen that the variability of parameter estimates depends not on a single factor, but on several factors. Therefore, it is much harder to find an intuitive reasonable guess of this variability then a guess of the point estimates for β and α.
With regression analysis we have a...
Erscheint lt. Verlag | 12.8.1998 |
---|---|
Sprache | englisch |
Themenwelt | Geisteswissenschaften ► Psychologie ► Allgemeine Psychologie |
Geisteswissenschaften ► Psychologie ► Test in der Psychologie | |
Mathematik / Informatik ► Mathematik ► Statistik | |
Naturwissenschaften | |
Sozialwissenschaften ► Pädagogik | |
Sozialwissenschaften ► Politik / Verwaltung | |
Sozialwissenschaften ► Soziologie ► Allgemeines / Lexika | |
Sozialwissenschaften ► Soziologie ► Empirische Sozialforschung | |
Technik | |
ISBN-10 | 0-08-055082-7 / 0080550827 |
ISBN-13 | 978-0-08-055082-4 / 9780080550824 |
Haben Sie eine Frage zum Produkt? |
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich