Data Science For Dummies
John Wiley & Sons Inc (Verlag)
978-1-119-32763-9 (ISBN)
- Titel erscheint in neuer Auflage
- Artikel merken
Discover how data science can help you gain in-depth insight into your business - the easy way! Jobs in data science abound, but few people have the data science skills needed to fill these increasingly important roles. Data Science For Dummies is the perfect starting point for IT professionals and students who want a quick primer on all areas of the expansive data science space. With a focus on business cases, the book explores topics in big data, data science, and data engineering, and how these three areas are combined to produce tremendous value. If you want to pick-up the skills you need to begin a new career or initiate a new project, reading this book will help you understand what technologies, programming languages, and mathematical methods on which to focus. While this book serves as a wildly fantastic guide through the broad, sometimes intimidating field of big data and data science, it is not an instruction manual for hands-on implementation.
Here s what to expect: * Provides a background in big data and data engineering before moving on to data science and how it's applied to generate value * Includes coverage of big data frameworks like Hadoop, MapReduce, Spark, MPP platforms, and NoSQL * Explains machine learning and many of its algorithms as well as artificial intelligence and the evolution of the Internet of Things * Details data visualization techniques that can be used to showcase, summarize, and communicate the data insights you generate It's a big, big data world out there let Data Science For Dummies help you harness its power and gain a competitive edge for your organization.
Lillian Pierson, P.E. is a data scientist, professional environmental engineer, and leading data science consultant to global leaders in IT, major governmental and non-governmental entities, prestigious media corporations, and not-for-profit technology groups.
Foreword xv
Introduction 1
About This Book 2
Foolish Assumptions 2
Icons Used in This Book 3
Beyond the Book 3
Where to Go from Here 4
Part 1: Getting Started with Data Science 5
Chapter 1: Wrapping Your Head around Data Science 7
Seeing Who Can Make Use of Data Science 8
Analyzing the Pieces of the Data Science Puzzle 10
Collecting, querying, and consuming data 10
Applying mathematical modeling to data science tasks 11
Deriving insights from statistical methods 12
Coding, coding, coding — it’s just part of the game 12
Applying data science to a subject area 12
Communicating data insights 14
Exploring the Data Science Solution Alternatives 14
Assembling your own in-house team 14
Outsourcing requirements to private data science consultants 15
Leveraging cloud-based platform solutions 15
Letting Data Science Make You More Marketable 16
Chapter 2: Exploring Data Engineering Pipelines and Infrastructure 17
Defining Big Data by the Three Vs 18
Grappling with data volume 18
Handling data velocity 18
Dealing with data variety 19
Identifying Big Data Sources 20
Grasping the Difference between Data Science and Data Engineering 21
Defining data science 21
Defining data engineering 22
Comparing data scientists and data engineers 23
Making Sense of Data in Hadoop 24
Digging into MapReduce 24
Stepping into real-time processing 26
Storing data on the Hadoop distributed file system (HDFS) 27
Putting it all together on the Hadoop platform 28
Identifying Alternative Big Data Solutions 28
Introducing massively parallel processing (MPP) platforms 29
Introducing NoSQL databases 29
Data Engineering in Action: A Case Study 30
Identifying the business challenge 30
Solving business problems with data engineering 32
Boasting about benefits 32
Chapter 3: Applying Data-Driven Insights to Business and Industry 33
Benefiting from Business-Centric Data Science 34
Converting Raw Data into Actionable Insights with Data Analytics 35
Types of analytics 35
Common challenges in analytics 36
Data wrangling 36
Taking Action on Business Insights 37
Distinguishing between Business Intelligence and Data Science 39
Business intelligence, defined 39
The kinds of data used in business intelligence 40
Technologies and skillsets that are useful in business intelligence 40
Defining Business-Centric Data Science 41
Kinds of data that are useful in business-centric data science 42
Technologies and skillsets that are useful in business-centric data science 43
Making business value from machine learning methods 43
Differentiating between Business Intelligence and Business-Centric Data Science 44
Knowing Whom to Call to Get the Job Done Right 45
Exploring Data Science in Business: A Data-Driven Business Success Story 46
Part 2: Using Data Science to Extract Meaning from Your Data 49
Chapter 4: Machine Learning: Learning from Data with Your Machine 51
Defining Machine Learning and Its Processes 51
Walking through the steps of the machine learning process 52
Getting familiar with machine learning terms 52
Considering Learning Styles 53
Learning with supervised algorithms 53
Learning with unsupervised algorithms 53
Learning with reinforcement 54
Seeing What You Can Do 54
Selecting algorithms based on function 54
Using Spark to generate real-time big data analytics 58
Chapter 5: Math, Probability, and Statistical Modeling 61
Exploring Probability and Inferential Statistics 62
Probability distributions 63
Conditional probability with Naïve Bayes 65
Quantifying Correlation 66
Calculating correlation with Pearson’s r 66
Ranking variable-pairs using Spearman’s rank correlation 66
Reducing Data Dimensionality with Linear Algebra 67
Decomposing data to reduce dimensionality 67
Reducing dimensionality with factor analysis 69
Decreasing dimensionality and removing outliers with PCA 70
Modeling Decisions with Multi-Criteria Decision Making 70
Turning to traditional MCDM 71
Focusing on fuzzy MCDM 72
Introducing Regression Methods 73
Linear regression 73
Logistic regression 74
Ordinary least squares (OLS) regression methods 74
Detecting Outliers 75
Analyzing extreme values 75
Detecting outliers with univariate analysis 76
Detecting outliers with multivariate analysis 77
Introducing Time Series Analysis 78
Identifying patterns in time series 78
Modeling univariate time series data 79
Chapter 6: Using Clustering to Subdivide Data 81
Introducing Clustering Basics 81
Getting to know clustering algorithms 82
Looking at clustering similarity metrics 85
Identifying Clusters in Your Data 86
Clustering with the k-means algorithm 86
Estimating clusters with kernel density estimation (KDE) 87
Clustering with hierarchical algorithms 88
Dabbling in the DBScan neighborhood 90
Categorizing Data with Decision Tree and Random Forest Algorithms 91
Chapter 7: Modeling with Instances 93
Recognizing the Difference between Clustering and Classification 94
Reintroducing clustering concepts 94
Getting to know classification algorithms 95
Making Sense of Data with Nearest Neighbor Analysis 97
Classifying Data with Average Nearest Neighbor Algorithms 98
Classifying with K-Nearest Neighbor Algorithms 101
Understanding how the k-nearest neighbor algorithm works 102
Knowing when to use the k-nearest neighbor algorithm 103
Exploring common applications of k-nearest neighbor algorithms 104
Solving Real-World Problems with Nearest Neighbor Algorithms 104
Seeing k-nearest neighbor algorithms in action 104
Seeing average nearest neighbor algorithms in action 105
Chapter 8: Building Models That Operate Internet-of-Things Devices 107
Overviewing the Vocabulary and Technologies 108
Learning the lingo 108
Procuring IoT platforms 110
Spark streaming for the IoT 110
Getting context-aware with sensor fusion 111
Digging into the Data Science Approaches 111
Taking on time series 112
Geospatial analysis 112
Dabbling in deep learning 113
Advancing Artificial Intelligence Innovation 113
Part 3: Creating Data Visualizations That Clearly Communicate Meaning 115
Chapter 9: Following the Principles of Data Visualization Design 117
Data Visualizations: The Big Three 118
Data storytelling for organizational decision makers 118
Data showcasing for analysts 118
Designing data art for activists 119
Designing to Meet the Needs of Your Target Audience 119
Step 1: Brainstorm (about Brenda) 120
Step 2: Define the purpose 121
Step 3: Choose the most functional visualization type for your purpose 121
Picking the Most Appropriate Design Style 122
Inducing a calculating, exacting response 122
Eliciting a strong emotional response 123
Choosing How to Add Context 124
Creating context with data 125
Creating context with annotations 125
Creating context with graphical elements 125
Selecting the Appropriate Data Graphic Type 127
Standard chart graphics 127
Comparative graphics 130
Statistical plots 134
Topology structures 135
Spatial plots and maps 138
Choosing a Data Graphic 140
Chapter 10: Using D3.js for Data Visualization 141
Introducing the D3.js Library 141
Knowing When to Use D3.js (and When Not To) 142
Getting Started in D3.js 143
Bringing in the HTML and DOM 144
Bringing in the JavaScript and SVG 145
Bringing in the Cascading Style Sheets (CSS) 146
Bringing in the web servers and PHP 146
Implementing More Advanced Concepts and Practices in D3.js 147
Getting to know chain syntax 151
Getting to know scales 152
Getting to know transitions and interactions 153
Chapter 11: Web-Based Applications for Visualization Design 157
Designing Data Visualizations for Collaboration 158
Visualizing and collaborating with Plotly 159
Talking about Tableau Public 161
Visualizing Spatial Data with Online Geographic Tools 162
Making pretty maps with OpenHeatMap 163
Mapmaking and spatial data analytics with CartoDB 164
Visualizing with Open Source: Web-Based Data Visualization Platforms 166
Making pretty data graphics with Google Fusion Tables 166
Using iCharts for web-based data visualization 167
Using RAW for web-based data visualization 168
Knowing When to Stick with Infographics 170
Making cool infographics with Infogr.am 170
Making cool infographics with Piktochart 172
Chapter 12: Exploring Best Practices in Dashboard Design 173
Focusing on the Audience 174
Starting with the Big Picture 175
Getting the Details Right 176
Testing Your Design 178
Chapter 13: Making Maps from Spatial Data 179
Getting into the Basics of GIS 180
Spatial databases 181
File formats in GIS 182
Map projections and coordinate systems 185
Analyzing Spatial Data 187
Querying spatial data 187
Buffering and proximity functions 188
Using layer overlay analysis 189
Reclassifying spatial data 190
Getting Started with Open-Source QGIS 191
Getting to know the QGIS interface 191
Adding a vector layer in QGIS 192
Displaying data in QGIS 193
Part 4: Computing for Data Science 199
Chapter 14: Using Python for Data Science 201
Sorting Out the Python Data Types 203
Numbers in Python 204
Strings in Python 204
Lists in Python 204
Tuples in Python 205
Sets in Python 205
Dictionaries in Python 205
Putting Loops to Good Use in Python 206
Having Fun with Functions 207
Keeping Cool with Classes 208
Checking Out Some Useful Python Libraries 210
Saying hello to the NumPy library 211
Getting up close and personal with the SciPy library 213
Peeking into the Pandas offering 213
Bonding with MatPlotLib for data visualization 214
Learning from data with Scikit-learn 215
Analyzing Data with Python — an Exercise 216
Installing Python on the Mac and Windows OS 216
Loading CSV files 218
Calculating a weighted average 219
Drawing trendlines 222
Chapter 15: Using Open Source R for Data Science 225
R’s Basic Vocabulary 226
Delving into Functions and Operators 229
Iterating in R 232
Observing How Objects Work 234
Sorting Out Popular Statistical Analysis Packages 236
Examining Packages for Visualizing, Mapping, and Graphing in R 238
Visualizing R statistics with ggplot2 238
Analyzing networks with statnet and igraph 239
Mapping and analyzing spatial point patterns with spatstat 240
Chapter 16: Using SQL in Data Science 241
Getting a Handle on Relational Databases and SQL 242
Investing Some Effort into Database Design 245
Defining data types 246
Designing constraints properly 246
Normalizing your database 247
Integrating SQL, R, Python, and Excel into Your Data Science Strategy 249
Narrowing the Focus with SQL Functions 249
Chapter 17: Doing Data Science with Excel and Knime 255
Making Life Easier with Excel 255
Using Excel to quickly get to know your data 256
Reformatting and summarizing with pivot tables 261
Automating Excel tasks with macros 262
Using KNIME for Advanced Data Analytics 264
Reducing customer churn via KNIME 265
Using KNIME to make the most of your social data 265
Using KNIME for environmental good stewardship 266
Part 5: Applying Domain Expertise to Solve Real-World Problems Using Data Science 267
Chapter 18: Data Science in Journalism: Nailing Down the Five Ws (and an H) 269
Who Is the Audience? 270
Who made the data 271
Who comprises the audience 271
What: Getting Directly to the Point 272
Bringing Data Journalism to Life: The Black Budget 273
When Did It Happen? 274
When as the context to your story 274
When does the audience care the most? 275
Where Does the Story Matter? 275
Where is the story relevant? 276
Where should the story be published? 276
Why the Story Matters 277
Asking why in order to generate and augment a storyline 277
Why your audience should care 277
How to Develop, Tell, and Present the Story 278
Integrating how as a source of data and story context 278
Finding stories in your data 278
Presenting a data-driven story 279
Collecting Data for Your Story 279
Scraping data 279
Setting up data alerts 280
Finding and Telling Your Data’s Story 280
Spotting strange trends and outliers 281
Examining context to understand the significance of data 283
Emphasizing the story through visualization 284
Creating compelling and highly focused narratives 285
Chapter 19: Delving into Environmental Data Science 287
Modeling Environmental-Human Interactions with Environmental Intelligence 288
Examining the types of problems solved 288
Defining environmental intelligence 289
Identifying major organizations that work in environmental intelligence 290
Making positive impacts with environmental intelligence 291
Modeling Natural Resources in the Raw 293
Exploring natural resource modeling 293
Dabbling in data science 293
Modeling natural resources to solve environmental problems 294
Using Spatial Statistics to Predict for Environmental Variation across Space 295
Addressing environmental issues with spatial predictive analytics 296
Describing the data science that’s involved 296
Addressing environmental issues with spatial statistics 297
Chapter 20: Data Science for Driving Growth in E-Commerce 299
Making Sense of Data for E-Commerce Growth 302
Optimizing E-Commerce Business Systems 303
Angling in on analytics 304
Talking about testing your strategies 308
Segmenting and targeting for success 311
Chapter 21: Using Data Science to Describe and Predict Criminal Activity 315
Temporal Analysis for Crime Prevention and Monitoring 316
Spatial Crime Prediction and Monitoring 317
Crime mapping with GIS technology 317
Going one step further with location-allocation analysis 318
Analyzing complex spatial statistics to better understand crime 319
Probing the Problems with Data Science for Crime Analysis 322
Caving in on civil rights 322
Taking on technical limitations 323
Part 6: The Part of Tens 325
Chapter 22: Ten Phenomenal Resources for Open Data 327
Digging through data.gov 328
Checking Out Canada Open Data 329
Diving into data.gov.uk 330
Checking Out U.S Census Bureau Data 331
Knowing NASA Data 332
Wrangling World Bank Data 333
Getting to Know Knoema Data 334
Queuing Up with Quandl Data 335
Exploring Exversion Data 336
Mapping OpenStreetMap Spatial Data 337
Chapter 23: Ten Free Data Science Tools and Applications 339
Making Custom Web-Based Data Visualizations with Free R Packages 340
Getting Shiny by RStudio 340
Charting with rCharts 341
Mapping with rMaps 341
Examining Scraping, Collecting, and Handling Tools 342
Scraping data with import.io 342
Collecting images with ImageQuilts 343
Wrangling data with DataWrangler 343
Looking into Data Exploration Tools 344
Getting up to speed in Gephi 345
Machine learning with the WEKA suite 347
Evaluating Web-Based Visualization Tools 347
Getting a little Weave up your sleeve 347
Checking out Knoema’s data visualization offerings 348
Index 351
Erscheinungsdatum | 02.04.2017 |
---|---|
Vorwort | Jake Porway |
Verlagsort | New York |
Sprache | englisch |
Maße | 183 x 239 mm |
Gewicht | 512 g |
Themenwelt | Informatik ► Datenbanken ► Data Warehouse / Data Mining |
Informatik ► Office Programme ► Outlook | |
Mathematik / Informatik ► Informatik ► Theorie / Studium | |
ISBN-10 | 1-119-32763-6 / 1119327636 |
ISBN-13 | 978-1-119-32763-9 / 9781119327639 |
Zustand | Neuware |
Haben Sie eine Frage zum Produkt? |
aus dem Bereich