R for Everyone
Addison Wesley (Verlag)
978-0-13-454692-6 (ISBN)
Drawing on his unsurpassed experience teaching new users, professional data scientist Jared P. Lander has written the perfect tutorial for anyone new to statistical programming and modeling. Organized to make learning easy and intuitive, this guide focuses on the 20 percent of R functionality you’ll need to accomplish 80 percent of modern data tasks. Lander’s self-contained chapters start with the absolute basics, offering extensive hands-on practice and sample code. You’ll download and install R; navigate and use the R environment; master basic program control, data import, and manipulation; and walk through several essential tests. Then, building on this foundation, you’ll construct several complete models, both linear and nonlinear, and use some data mining techniques.
By the time you’re done, you won’t just know how to write R programs, you’ll be ready to tackle the statistical problems you care about most.
Coverage Includes:
Exploring R, RStudio, and R packages
Using R for math: variable types, vectors, calling functions, and more
Exploiting data structures, including data.frames, matrices, and lists
Creating attractive, intuitive statistical graphics
Writing user-defined functions
Controlling program flow with if, ifelse, and complex checks
Improving program efficiency with group manipulations
Combining and reshaping multiple datasets
Manipulating strings using R’s facilities and regular expressions
Creating normal, binomial, and Poisson probability distributions
Programming basic statistics: mean, standard deviation, and t-tests
Building linear, generalized linear, and nonlinear models
Assessing the quality of models and variable selection
Preventing overfitting, using the Elastic Net and Bayesian methods
Analyzing univariate and multivariate time series data
Grouping data via K-means and hierarchical clustering
Preparing reports, slideshows, and web pages with knitr
Building reusable R packages with devtools and Rcpp
Getting involved with the R global community
Jared P. Lander is the owner of Lander Analytics, a statistical consulting firm based in New York City, the organizer of the New York Open Statistical Programming Meetup and an adjunct professor of statistics at Columbia University. He is also a tour guide for Scott’s Pizza Tours and an advisor to Brewla Bars, a gourmet ice pop startup. With an M.A. from Columbia University in statistics and a B.A. from Muhlenberg College in mathematics, he has experience in both academic research and industry. His work for both large and small organizations spans politics, tech startups, fund raising, music, finance, healthcare, and humanitarian relief efforts. He specializes in data management, multilevel models, machine learning, generalized linear models, visualization, data management, and statistical computing.
Foreword xv
Preface xvii
Acknowledgments xxi
About the Author xxv
Chapter 1: Getting R 1
1.1 Downloading R 1
1.2 R Version 2
1.3 32-bit vs. 64-bit 2
1.4 Installing 2
1.5 Microsoft R Open 14
1.6 Conclusion 14
Chapter 2: The R Environment 15
2.1 Command Line Interface 16
2.2 RStudio 17
2.3 Microsoft Visual Studio 31
2.4 Conclusion 31
Chapter 3: R Packages 33
3.1 Installing Packages 33
3.2 Loading Packages 36
3.3 Building a Package 37
3.4 Conclusion 37
Chapter 4: Basics of R 39
4.1 Basic Math 39
4.2 Variables 40
4.3 Data Types 42
4.4 Vectors 47
4.5 Calling Functions 52
4.6 Function Documentation 52
4.7 Missing Data 53
4.8 Pipes 54
4.9 Conclusion 55
Chapter 5: Advanced Data Structures 57
5.1 data.frames 57
5.2 Lists 64
5.3 Matrices 70
5.4 Arrays 73
5.5 Conclusion 74
Chapter 6: Reading Data into R 75
6.1 Reading CSVs 75
6.2 Excel Data 79
6.3 Reading from Databases 81
6.4 Data from Other Statistical Tools 84
6.5 R Binary Files 85
6.6 Data Included with R 87
6.7 Extract Data from Web Sites 88
6.8 Reading JSON Data 90
6.9 Conclusion 92
Chapter 7: Statistical Graphics 93
7.1 Base Graphics 93
7.2 ggplot2 96
7.3 Conclusion 110
Chapter 8: Writing R functions 111
8.1 Hello, World! 111
8.2 Function Arguments 112
8.3 Return Values 114
8.4 do.call 115
8.5 Conclusion 116
Chapter 9: Control Statements 117
9.1 if and else 117
9.2 switch 120
9.3 ifelse 121
9.4 Compound Tests 123
9.5 Conclusion 123
Chapter 10: Loops, the Un-R Way to Iterate 125
10.1 for Loops 125
10.2 while Loops 127
10.3 Controlling Loops 127
10.4 Conclusion 128
Chapter 11: Group Manipulation 129
11.1 Apply Family 129
11.2 aggregate 132
11.3 plyr 136
11.4 data.table 140
11.5 Conclusion 150
Chapter 12: Faster Group Manipulation with dplyr 151
12.1 Pipes 151
12.2 tbl 152
12.3 select 153
12.4 filter 161
12.5 slice 167
12.6 mutate 168
12.7 summarize 171
12.8 group_by 172
12.9 arrange 173
12.10 do 174
12.11 dplyr with Databases 176
12.12 Conclusion 178
Chapter 13: Iterating with purrr 179
13.1 map 179
13.2 map with Specified Types 181
13.3 Iterating over a data.frame 186
13.4 map with Multiple Inputs 187
13.5 Conclusion 188
Chapter 14: Data Reshaping 189
14.1 cbind and rbind 189
14.2 Joins 190
14.3 reshape2 197
14.4 Conclusion 200
Chapter 15: Reshaping Data in the Tidyverse 201
15.1 Binding Rows and Columns 201
15.2 Joins with dplyr 202
15.3 Converting Data Formats 207
15.4 Conclusion 210
Chapter 16: Manipulating Strings 211
16.1 paste 211
16.2 sprintf 212
16.3 Extracting Text 213
16.4 Regular Expressions 217
16.5 Conclusion 224
Chapter 17: Probability Distributions 225
17.1 Normal Distribution 225
17.2 Binomial Distribution 230
17.3 Poisson Distribution 235
17.4 Other Distributions 238
17.5 Conclusion 240
Chapter 18: Basic Statistics 241
18.1 Summary Statistics 241
18.2 Correlation and Covariance 244
18.3 T-Tests 252
18.4 ANOVA 260
18.5 Conclusion 263
Chapter 19: Linear Models 265
19.1 Simple Linear Regression 265
19.2 Multiple Regression 270
19.3 Conclusion 287
Chapter 20: Generalized Linear Models 289
20.1 Logistic Regression 289
20.2 Poisson Regression 293
20.3 Other Generalized Linear Models 297
20.4 Survival Analysis 297
20.5 Conclusion 302
Chapter 21: Model Diagnostics 303
21.1 Residuals 303
21.2 Comparing Models 309
21.3 Cross-Validation 313
21.4 Bootstrap 318
21.5 Stepwise Variable Selection 321
21.6 Conclusion 324
Chapter 22: Regularization and Shrinkage 325
22.1 Elastic Net 325
22.2 Bayesian Shrinkage 342
22.3 Conclusion 346
Chapter 23: Nonlinear Models 347
23.1 Nonlinear Least Squares 347
23.2 Splines 350
23.3 Generalized Additive Models 353
23.4 Decision Trees 359
23.5 Boosted Trees 361
23.6 Random Forests 364
23.7 Conclusion 366
Chapter 24: Time Series and Autocorrelation 367
24.1 Autoregressive Moving Average 367
24.2 VAR 374
24.3 GARCH 379
24.4 Conclusion 388
Chapter 25: Clustering 389
25.1 K-means 389
25.2 PAM 397
25.3 Hierarchical Clustering 403
25.4 Conclusion 407
Chapter 26: Model Fitting with Caret 409
26.1 Caret Basics 409
26.2 Caret Options 409
26.3 Tuning a Boosted Tree 411
26.4 Conclusion 415
Chapter 27: Reproducibility and Reports with knitr 417
27.1 Installing a LaTeX Program 417
27.2 LaTeX Primer 418
27.3 Using knitr with LaTeX 420
27.4 Conclusion 426
Chapter 28: Rich Documents with RMarkdown 427
28.1 Document Compilation 427
28.2 Document Header 427
28.3 Markdown Primer 429
28.4 Markdown Code Chunks 430
28.5 htmlwidgets 432
28.6 RMarkdown Slideshows 444
28.7 Conclusion 446
Chapter 29: Interactive Dashboards with Shiny 447
29.1 Shiny in RMarkdown 447
29.2 Reactive Expressions in Shiny 452
29.3 Server and UI 454
29.4 Conclusion 463
Chapter 30: Building R Packages 465
30.1 Folder Structure 465
30.2 Package Files 465
30.3 Package Documentation 472
30.4 Tests 475
30.5 Checking, Building and Installing 477
30.6 Submitting to CRAN 479
30.7 C++ Code 479
30.8 Conclusion 484
Appendix A: Real-Life Resources 485
A.1 Meetups 485
A.2 Stack Overflow 486
A.3 Twitter 487
A.4 Conferences 487
A.5 Web Sites 488
A.6 Documents 488
A.7 Books 488
A.8 Conclusion 489
Appendix B: Glossary 491
List of Figures 507
List of Tables 513
General Index 515
Index of Functions 521
Index of Packages 527
Index of People 529
Data Index 531
Erscheinungsdatum | 06.07.2017 |
---|---|
Reihe/Serie | Addison-Wesley Data & Analytics Series |
Verlagsort | Boston |
Sprache | englisch |
Themenwelt | Informatik ► Datenbanken ► Data Warehouse / Data Mining |
Mathematik / Informatik ► Informatik ► Programmiersprachen / -werkzeuge | |
ISBN-10 | 0-13-454692-X / 013454692X |
ISBN-13 | 978-0-13-454692-6 / 9780134546926 |
Zustand | Neuware |
Haben Sie eine Frage zum Produkt? |
aus dem Bereich