R for Stata Users (eBook)
XXIV, 530 Seiten
Springer New York (Verlag)
978-1-4419-1318-0 (ISBN)
Stata is the most flexible and extensible data analysis package available from a commercial vendor. R is a similarly flexible free and open source package for data analysis, with over 3,000 add-on packages available. This book shows you how to extend the power of Stata through the use of R. It introduces R using Stata terminology with which you are already familiar. It steps through more than 30 programs written in both languages, comparing and contrasting the two packages' different approaches. When finished, you will be able to use R in conjunction with Stata, or separately, to import data, manage and transform it, create publication quality graphics, and perform basic statistical analyses.
A glossary defines over 50 R terms using Stata jargon and again using more formal R terminology. The table of contents and index allow you to find equivalent R functions by looking up Stata commands and vice versa. The example programs and practice datasets for both R and Stata are available for download.
Stata is the most flexible and extensible data analysis package available from a commercial vendor. R is a similarly flexible free and open source package for data analysis, with over 3,000 add-on packages available. This book shows you how to extend the power of Stata through the use of R. It introduces R using Stata terminology with which you are already familiar. It steps through more than 30 programs written in both languages, comparing and contrasting the two packages' different approaches. When finished, you will be able to use R in conjunction with Stata, or separately, to import data, manage and transform it, create publication quality graphics, and perform basic statistical analyses.A glossary defines over 50 R terms using Stata jargon and again using more formal R terminology. The table of contents and index allow you to find equivalent R functions by looking up Stata commands and vice versa. The example programs and practice datasets for both R and Stata are available for download.
Preface 6
Contents 10
List of Tables 20
List of Figures 22
1 Introduction 26
1.1 Overview 26
1.2 Similarities Between R and Stata 27
1.3 Why Learn R? 28
1.4 Is R Accurate? 29
1.5 What About Tech Support? 29
1.6 Getting Started Quickly 30
1.7 Programming Conventions 30
1.8 Typographic Conventions 31
2 Installing and Updating R 33
2.1 Installing Add-on Packages 34
2.2 Loading an Add-on Package 34
2.3 Updating Your Installation 38
2.4 Uninstalling R 39
2.5 Choosing Repositories 39
2.6 Accessing Data in Packages 41
3 Running R 43
3.1 Running R Interactively on Windows 43
3.2 Running R Interactively on Macintosh 45
3.3 Running R Interactively on Linux or UNIX 47
3.4 Running Programs That Include Other Programs 49
3.5 Running R in Batch Mode 49
3.6 Graphical User Interfaces 50
3.6.1 R Commander 50
3.6.2 Rattle for Data Mining 53
3.6.3 JGR Java GUI for R 54
4 Help and Documentation 60
4.1 Introduction 60
4.2 Help Files 60
4.3 Starting Help 60
4.4 Help Examples 62
4.5 Help for Functions That Call Other Functions 63
4.6 Help for Packages 64
4.7 Help for Data Sets 65
4.8 Books and Manuals 65
4.9 E-mail Lists 65
4.10 Searching the Web 66
4.11 Vignettes 66
5 Programming Language Basics 68
5.1 Introduction 68
5.2 Simple Calculations 69
5.3 Data Structures 70
5.3.1 Vectors 70
5.3.2 Factors 74
5.3.3 Data Frames 79
5.3.4 Matrices 83
5.3.5 Arrays 86
5.3.6 Lists 86
5.4 Saving Your Work 90
5.5 Comments to Document Your Programs 92
5.6 Controlling Functions (Commands) 93
5.6.1 Controlling Functions with Arguments 93
5.6.2 Controlling Functions with Formulas 95
5.6.3 Controlling Functions with an Object's Class 96
5.6.4 Controlling Functions with Extractor Functions 98
5.7 How Much Output is There? 100
5.8 Writing Your Own Functions (Macros) 104
5.9 R Program Demonstrating Programming Basics 107
6 Data Acquisition 114
6.1 The R Data Editor 114
6.2 Reading Delimited Text Files 116
6.2.1 Reading Comma-Delimited Text Files 117
6.2.2 Reading Tab-Delimited Text Files 118
6.2.3 Missing Values for Character Variables 120
6.2.4 Trouble with Tabs 121
6.2.5 Skipping Variables in Delimited Files 122
6.2.6 Example Programs for Reading Delimited TextFiles 123
6.3 Reading Text Data Within a Program 125
6.3.1 The Easy Approach 125
6.3.2 The More General Approach 127
6.3.3 Example Programs for Reading Text Data Within a Program 127
6.4 Reading Fixed-Width Text Files, One Record per Case 129
6.4.1 Macro Substitution 132
6.4.2 Example Programs for Reading Fixed-Width Text Files, One Record Per Case 133
6.5 Reading Fixed-Width Text Files, Two or More Records per Case 134
6.5.1 Example Programs to Read Fixed-Width Text Files with Two Records per Case 135
6.6 Importing Data from Stata into R 136
6.6.1 R Program to Import Data from Stata 137
6.7 Writing Data to a Comma-Delimited Text File 137
6.7.1 Example Programs for Writing a Comma-Delimited File 138
6.8 Exporting Data from R to Stata 139
7 Selecting Variables 141
7.1 Selecting Variables in Stata 141
7.2 Selecting All Variables 142
7.3 Selecting Variables Using Index Numbers 142
7.4 Selecting Variables Using Column Names 145
7.5 Selecting Variables Using Logic 146
7.6 Selecting Variables Using String Search 148
7.7 Selecting Variables Using $ Notation 150
7.8 Selecting Variables Using Component Names 151
7.8.1 The attach Function 151
7.8.2 The with Function 152
7.8.3 Using Component Names in Formulas 152
7.9 Selecting Variables with the subset Function 153
7.10 Selecting Variables Using List Index 154
7.11 Generating Indexes A to Z from Two Variable Names 154
7.12 Saving Selected Variables to a New Dataset 155
7.13 Example Programs for Variable Selection 156
7.13.1 Stata Program to Select Variables 156
7.13.2 R Program to Select Variables 156
8 Selecting Observations 161
8.1 Selecting Observations in Stata 161
8.2 Selecting All Observations 162
8.3 Selecting Observations Using Index Numbers 162
8.4 Selecting Observations Using Row Names 165
8.5 Selecting Observations Using Logic 167
8.6 Selecting Observations Using String Search 170
8.7 Selecting Observations Using the subset Function 172
8.8 Generating Indexes A to Z from Two Row Names 173
8.9 Variable Selection Methods with No Counterpart for Selecting Observations 174
8.10 Saving Selected Observations to a New Data Frame 174
8.11 Example Programs for Selecting Observations 174
8.11.1 Stata Program to Select Observations 175
8.11.2 R Program to Select Observations 175
9 Selecting Variables and Observations 179
9.1 The subset Function 179
9.2 Selecting Observations by Logic and Variables by Name 180
9.3 Using Names to Select Both Observations and Variables 181
9.4 Using Numeric Index Values to Select Both Observations and Variables 182
9.5 Using Logic to Select Both Observations and Variables 183
9.6 Saving and Loading Subsets 184
9.7 Example Programs for Selecting Variables and Observations 184
9.7.1 Stata Program for Selecting Variables and Observations 184
9.7.2 R Program for Selecting Variables and Observations 185
10 Data Management 189
10.1 Transforming Variables 189
10.1.1 Example Programs for Transforming Variables 193
10.2 Functions or Commands? The apply Function Decides 194
10.2.1 Applying the mean Function 195
10.2.2 Finding N or NVALID 198
10.2.3 Example Programs for Applying StatisticalFunctions 200
10.3 Conditional Transformations 202
10.3.1 Example Programs for ConditionalTransformations 204
10.4 Multiple Conditional Transformations 205
10.4.1 Example Programs for Multiple Conditional Transformations 207
10.5 Missing Values 208
10.5.1 Substituting Means for Missing Values 210
10.5.2 Finding Complete Observations 211
10.5.3 When ``99'' Has Meaning 212
10.5.4 Example Programs to Assign Missing Values 214
10.6 Renaming Variables (and Observations) 216
10.6.1 Renaming Variables---Advanced Examples 218
10.6.2 Renaming by Index 219
10.6.3 Renaming by Column Name 220
10.6.4 Renaming Many Sequentially Numbered Variable Names 221
10.6.5 Renaming Observations 222
10.6.6 Example Programs for Renaming Variables 222
10.7 Recoding Variables 226
10.7.1 Recoding a Few Variables 227
10.7.2 Recoding Many Variables 227
10.7.3 Example Programs for Recoding Variables 230
10.8 Keeping and Dropping Variables 231
10.8.1 Example Programs for Keeping and Dropping Variables 232
10.9 Stacking/Appending Data Sets 232
10.9.1 Example Programs for Stacking/AppendingData Sets 235
10.10 Joining/Merging Data Sets 236
10.10.1 Example Programs for Joining/Merging Data Sets 239
10.11 Creating Collapsed or Aggregated Data Sets 241
10.11.1 The aggregate Function 241
10.11.2 The tapply Function 243
10.11.3 Merging Aggregates with Original Data 244
10.11.4 Tabular Aggregation 246
10.11.5 The reshape Package 248
10.11.6 Example Programs for Collapsing/AggregatingData 248
10.12 By or Split-File Processing 250
10.12.1 Comparing Summarization Methods 254
10.12.2 Example Programs for By or Split-file Processing 255
10.13 Removing Duplicate Observations 256
10.13.1 Example Programs for Removing Duplicate Observations 258
10.14 Selecting First or Last Observations per Group 259
10.14.1 Example Programs for Selecting Last Observation per Group 261
10.15 Reshaping Variables to Observations and Back 262
10.15.1 Example Programs for Reshaping Variables to Observations and Back 264
10.16 Sorting Data Frames 265
10.16.1 Example Programs for Sorting Data Sets 268
10.17 Converting Data Structures 269
10.17.1 Converting from Logical to Numeric Indexand Back 272
11 Enhancing Your Output 274
11.1 Value Labels or Formats (and Measurement Level) 274
11.1.1 Character Factors 275
11.1.2 Numeric Factors 277
11.1.3 Making Factors of Many Variables 279
11.1.4 Converting Factors into Numeric or Character Variables 281
11.1.5 Dropping Factor Levels 283
11.1.6 Example Programs for Value Labels or Formats 284
11.2 Variable Labels 287
11.2.1 Variable Labels in The Hmisc Package 287
11.2.2 Long Variable Names as Labels 288
11.2.3 Other Packages That Support Variable Labels 291
11.2.4 Example Programs for Variable Labels 291
11.3 Output for Word Processing and Web Pages 292
11.3.1 The xtable Package 293
11.3.2 Other Options for Formatting Output 295
11.3.3 Example Programs for Formatting Output 296
12 Generating Data 298
12.1 Generating Numeric Sequences 299
12.2 Generating Factors 300
12.3 Generating Repetitious Patterns (Not Factors) 301
12.4 Generating Integer Measures 302
12.5 Generating Continuous Measures 304
12.6 Generating a Data Frame 306
12.7 Example Programs for Generating Data 306
12.7.1 Stata Program for Generating Data 306
12.7.2 R Program for Generating Data 307
13 Managing Your Files and Workspace 312
13.1 Loading and Listing Objects 312
13.2 Understanding Your Search Path 315
13.3 Attaching Data Frames 317
13.4 Attaching Files 319
13.5 Removing Objects from Your Workspace 320
13.6 Minimizing Your Workspace 322
13.7 Setting Your Working Directory 322
13.8 Saving Your Workspace 323
13.8.1 Saving Your Workspace Manually 323
13.8.2 Saving Your Workspace Automatically 324
13.9 Getting Operating Systems to Show You ``.RData'' Files 324
13.10 Organizing Projects with Windows Shortcuts 325
13.11 Saving Your Programs and Output 325
13.12 Saving Your History 326
13.13 Large Data Set Considerations 326
13.14 Example R Program for Managing Filesand Workspace 328
14 Graphics Overview 332
14.1 Stata Graphics 333
14.2 R Graphics 333
14.3 The Grammar of Graphics 334
14.4 Other Graphics Packages 336
14.5 Graphics Procedures and Graphics Systems 336
14.6 Graphics Devices 337
14.7 Practice Data: mydata100 339
15 Traditional Graphics 340
15.1 Bar Plots 340
15.1.1 Bar Plots of Counts 340
15.1.2 Bar Plots for Subgroups of Counts 345
15.1.3 Bar Plots of Means 347
15.2 Adding Titles, Labels, Colors, and Legends 348
15.3 Graphics Parameters and Multiple Plots on a Page 351
15.4 Pie Charts 352
15.5 Dot Charts 354
15.6 Histograms 354
15.6.1 Basic Histograms 355
15.6.2 Histograms Stacked 357
15.6.3 Histograms Overlaid 358
15.7 Normal QQ Plots 362
15.8 Strip Charts 363
15.9 Scatter Plots and Line Plots 368
15.9.1 Scatter plots with Jitter 371
15.9.2 Scatter plots with Large Data Sets 371
15.9.3 Scatter plots with Lines 373
15.9.4 Scatter plots with Linear Fit by Group 374
15.9.5 Scatter plots by Group or Level (Coplots) 375
15.9.6 Scatter plots with Confidence Ellipse 377
15.9.7 Scatter plots with Confidence and PredictionIntervals 378
15.9.8 Plotting Labels Instead of Points 383
15.9.9 Scatter plot Matrices 385
15.10 Dual-Axes Plots 387
15.11 Box Plots 389
15.12 Error Bar Plots 391
15.13 Interaction Plots 391
15.14 Adding Equations and Symbols to Graphs 392
15.15 Summary of Graphics Elements and Parameters 393
15.16 Plot Demonstrating Many Modifications 394
15.17 Example Program for Traditional Graphics 395
15.17.1 Stata Program for Traditional Graphics 396
15.17.2 R Program for Traditional Graphics 396
16 Graphics with ggplot2 406
16.1 Introduction 406
16.1.1 Overview qplot and ggplot 407
16.1.2 Missing Values 408
16.1.3 Typographic Conventions 409
16.2 Bar Plots 410
16.2.1 Pie Charts 413
16.2.2 Bar Charts for Groups 414
16.3 Plots by Group or Level 415
16.4 Presummarized Data 417
16.5 Dot Charts 418
16.6 Adding Titles and Labels 420
16.7 Histograms and Density Plots 421
16.7.1 Histograms 421
16.7.2 Density Plots 422
16.7.3 Histograms with Density Overlaid 422
16.7.4 Histograms for Groups, Stacked 424
16.7.5 Histograms for Groups, Overlaid 425
16.8 Normal QQ Plots 426
16.9 Strip Plots 426
16.10 Scatter Plots and Line Plots 429
16.10.1 Scatter Plots with Jitter 431
16.10.2 Scatter Plots for Large Data Sets 432
16.10.3 Hexbin Plots 435
16.10.4 Scatter Plots with Fit Lines 436
16.10.5 Scatter Plots with Reference Lines 437
16.10.6 Scatter Plots with Labels Instead of Points 441
16.10.7 Changing Plot Symbols 442
16.10.8 Scatter Plot with Linear Fits by Group 443
16.10.9 Scatter Plots Faceted for Groups 443
16.10.10 Scatter Plot Matrix 445
16.11 Box Plots 446
16.12 Error Bar Plots 449
16.13 Logarithmic Axes 451
16.14 Aspect Ratio 451
16.15 Multiple Plots on a Page 452
16.16 Saving ggplot2 Graphs to a File 454
16.17 An Example Specifying All Defaults 454
16.18 Summary of Graphic Elements and Parameters 456
16.19 Example Programs for ggplot2 457
17 Statistics 474
17.1 Scientific Notation 474
17.2 Descriptive Statistics 475
17.2.1 The Hmisc describe Function 475
17.2.2 The summary Function 477
17.2.3 The table Function and Its Relatives 478
17.2.4 The mean Function and Its Relatives 480
17.3 Cross-Tabulation 481
17.3.1 The CrossTable Function 481
17.3.2 The tables and chisq.test Functions 483
17.4 Correlation 486
17.4.1 The cor Function 489
17.5 Linear Regression 491
17.5.1 Plotting Diagnostics 494
17.5.2 Comparing Models 495
17.5.3 Making Predictions with New Data 496
17.6 t-Test: Independent Groups 497
17.7 Equality of Variance 498
17.8 t-Test: Paired or Repeated Measures 499
17.9 Wilcoxon Mann-Whitney Rank Sum Test: IndependentGroups 500
17.10 Wilcoxon Signed-Rank Test: Paired Groups 501
17.11 Analysis of Variance 502
17.12 Sums of Squares 507
17.13 The Kruskal--Wallis Test 508
17.14 Example Programs for Statistical Tests 510
17.14.1 Stata Program for Statistical Tests 510
17.14.2 R Program for Statistical Tests 512
18 Conclusion 518
Glossary of R jargon 519
Comparison of Stata commands and R functions 525
Automating Your R Setup 527
C.1 Setting Options 527
C.2 Creating Objects 528
C.3 Loading Packages 528
C.4 Running Functions 528
C.5 Example .Rprofile 530
Example Simulation 531
D.1 Stata Example Simulation 531
D.2 R Example Simulation 532
References 533
Index 537
"5 Programming Language Basics (p. 45-46)
5.1 Introduction
R is an object-oriented language. Everything that exists in it — variables, data sets, functions (commands) — are all objects. Stata has limitations on command and variable name lengths, based on the version of the software being used. The limits are large, though, and rarely result in a problem for Stata users. In Stata, leading periods in names are not allowed and data set names cannot have periods at all. Object names in R can be any length consisting of letters, numbers, underscores “ ,” or the period “.” and should begin with a letter. However, in R if you always put quotes around a variable or data set name (actually any object name), it can then contain any characters, including spaces.
Case matters in both R and Stata, so you can have two variables—one named myvar and another named MyVar—in the same data set, although that is not a good idea! Some add-on packages tweak function names like the capitalized “Save” to represent a compatible, but enhanced, version of a built-in function like the lowercased “save.” As in any statistics package, it is best to avoid names that match function names like “mean” or that match logical conditions like “TRUE.”
Commands can begin and end anywhere on a line and R will ignore any additional spaces. R will try to execute a function when it reaches the end of a line. Therefore, to continue a function call on a new line, you must ensure that the fragment you leave behind is not already a complete function call by itself. Continuing a function call on a new line after a comma is usually a safe bet. As you will see, R functions frequently use commas, making them a convenient stopping point.
The R console will tell you that it is continuing a line when it changes the prompt from “>” to “+.” If you see “+” unexpectedly, you may have simply forgotten to add the ?nal close parenthesis, “).” Submitting only that character will then ?nish your function call. If you are getting the “+” and cannot ?gure out why, you can cancel the pending function call with the Escape key on Windows or CTRL-C on Macintosh or Linux/UNIX. For CTRL-C, hold the CTRL key down (Linux/UNIX) or the control key (Macintosh) while pressing the letter C. You may end any R function call with a semicolon. That is not required though, except when entering multiple function calls on a single line."
Erscheint lt. Verlag | 26.4.2010 |
---|---|
Reihe/Serie | Statistics and Computing | Statistics and Computing |
Zusatzinfo | XXIV, 530 p. |
Verlagsort | New York |
Sprache | englisch |
Themenwelt | Mathematik / Informatik ► Informatik |
Mathematik / Informatik ► Mathematik ► Computerprogramme / Computeralgebra | |
Mathematik / Informatik ► Mathematik ► Statistik | |
Mathematik / Informatik ► Mathematik ► Wahrscheinlichkeit / Kombinatorik | |
Technik | |
Schlagworte | Data acquisition • Data Analysis • Stata • Statistica • Statistics |
ISBN-10 | 1-4419-1318-1 / 1441913181 |
ISBN-13 | 978-1-4419-1318-0 / 9781441913180 |
Haben Sie eine Frage zum Produkt? |
Größe: 5,9 MB
DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.
Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich