Mastering Machine Learning with Python in Six Steps (eBook)
XVII, 457 Seiten
Apress (Verlag)
978-1-4842-4947-5 (ISBN)
Explore fundamental to advanced Python 3 topics in six steps, all designed to make you a worthy practitioner. This updated version's approach is based on the 'six degrees of separation' theory, which states that everyone and everything is a maximum of six steps away and presents each topic in two parts: theoretical concepts and practical implementation using suitable Python 3 packages.
You'll start with the fundamentals of Python 3 programming language, machine learning history, evolution, and the system development frameworks. Key data mining/analysis concepts, such as exploratory analysis, feature dimension reduction, regressions, time series forecasting and their efficient implementation in Scikit-learn are covered as well. You'll also learn commonly used model diagnostic and tuning techniques. These include optimal probability cutoff point for class creation, variance, bias, bagging, boosting, ensemble voting, grid search, random search, Bayesian optimization, and the noise reduction technique for IoT data.
Finally, you'll review advanced text mining techniques, recommender systems, neural networks, deep learning, reinforcement learning techniques and their implementation. All the code presented in the book will be available in the form of iPython notebooks to enable you to try out these examples and extend them to your advantage.
What You'll Learn
- Understand machine learning development and frameworks
- Assess model diagnosis and tuning in machine learning
- Examine text mining, natuarl language processing (NLP), and recommender systems
- Review reinforcement learning and CNN
Python developers, data engineers, and machine learning engineers looking to expand their knowledge or career into machine learning area.
Explore fundamental to advanced Python 3 topics in six steps, all designed to make you a worthy practitioner. This updated version's approach is based on the "e;six degrees of separation"e; theory, which states that everyone and everything is a maximum of six steps away and presents each topic in two parts: theoretical concepts and practical implementation using suitable Python 3 packages.You'll start with the fundamentals of Python 3 programming language, machine learning history, evolution, and the system development frameworks. Key data mining/analysis concepts, such as exploratory analysis, feature dimension reduction, regressions, time series forecasting and their efficient implementation in Scikit-learn are covered as well. You'll also learn commonly used model diagnostic and tuning techniques. These include optimal probability cutoff point for class creation, variance, bias, bagging, boosting, ensemble voting, grid search, random search, Bayesian optimization, and the noise reduction technique for IoT data. Finally, you'll review advanced text mining techniques, recommender systems, neural networks, deep learning, reinforcement learning techniques and their implementation. All the code presented in the book will be available in the form of iPython notebooks to enable you to try out these examples and extend them to your advantage.What You'll LearnUnderstand machine learning development and frameworksAssess model diagnosis and tuning in machine learningExamine text mining, natuarl language processing (NLP), and recommender systemsReview reinforcement learning and CNNWho This Book Is ForPython developers, data engineers, and machine learning engineers looking to expand their knowledge or career into machine learning area.
Table of Contents 4
About the Author 10
About the Technical Reviewer 11
Acknowledgments 12
Introduction 13
Chapter 1: Step 1: Getting Started in Python 3 16
The Best Things in Life Are Free 16
The Rising Star 18
Choosing Python 2.x or Python 3.x 18
Windows 20
OSX 20
Graphical Installer 20
Command Line Installer 20
Linux 21
From Official Website 21
Running Python 21
Key Concepts 22
Python Identifiers 22
Keywords 22
My First Python Program 23
Code Blocks 23
Indentations 23
Suites 24
Basic Object Types 25
When to Use List, Tuple, Set, or Dictionary 28
Comments in Python 29
Multiline Statements 29
Multiple Statements on a Single Line 30
Basic Operators 30
Arithmetic Operators 31
Comparison or Relational Operators 32
Assignment Operators 34
Bitwise Operators 35
Logical Operators 37
Membership Operators 38
Identity Operators 39
Control Structures 39
Selections 40
Iterations 41
Lists 44
Tuples 48
Sets 52
Changing Sets in Python 56
Removing Items from Sets 57
Set Operations 57
Set Unions 57
Set Intersections 58
Set Difference 58
Set Symmetric Difference 59
Basic Operations 59
Dictionary 60
User-Defined Functions 66
Defining a Function 66
The Scope of Variables 68
Default Argument 69
Variable Length Arguments 69
Modules 70
File Input/Output 72
Opening a File 73
Exception Handling 74
Summary 79
Chapter 2: Step 2: Introduction to Machine Learning 80
History and Evolution 81
Artificial Intelligence Evolution 85
Different Forms 86
Statistics 87
Frequentist 88
Bayesian 88
Regression 89
Data Mining 90
Data Analytics 91
Descriptive Analytics 92
Diagnostic Analytics 93
Predictive Analytics 94
Prescriptive Analytics 94
Data Science 95
Statistics vs. Data Mining vs. Data Analytics vs. Data Science 97
Machine Learning Categories 97
Supervised Learning 98
Unsupervised Learning 99
Reinforcement Learning 99
Frameworks for Building ML Systems 100
Knowledge Discovery in Databases 101
Selection 101
Preprocessing 102
Transformation 102
Data Mining 103
Interpretation / Evaluation 103
Cross-Industry Standard Process for Data Mining 103
Phase 1: Business Understanding 104
Phase 2: Data Understanding 104
Phase 3: Data Preparation 105
Phase 4: Modeling 105
Phase 5: Evaluation 105
Phase 6: Deployment 105
SEMMA (Sample, Explore, Modify, Model, Assess) 106
Sample 106
Explore 106
Modify 106
Model 106
Assess 107
Machine Learning Python Packages 108
Data Analysis Packages 109
NumPy 110
Array 110
Creating NumPy Array 111
Data Types 113
Array Indexing 114
Field Access 114
Basic Slicing 115
Advanced Indexing 118
Array Math 119
Broadcasting 123
Pandas 126
Data Structures 126
Series 126
DataFrame 127
Reading and Writing Data 127
Basic Statistics Summary 128
Viewing Data 129
Basic Operations 131
Merge/Join 133
Join 135
Grouping 137
Pivot Tables 138
Matplotlib 139
Using Global Functions 139
Customizing Labels 141
Object-Oriented 142
Line Plots Using ax.plot() 143
Multiple Lines on the Same Axis 144
Multiple Lines on Different Axis 145
Control the Line Style and Marker Style 146
Line Style Reference 147
Marker Reference 148
Colormaps Reference 149
Bar Plots Using ax.bar() 149
Horizontal Bar Charts Using ax.barh() 150
Side by Side Bar Chart 152
Stacked Bar Example Code 153
Pie Chart Using ax.pie() 154
Example Code for Grid Creation 155
Plotting Defaults 156
Machine Learning Core Libraries 157
Summary 158
Chapter 3: Step 3: Fundamentals of Machine Learning 159
Machine Learning Perspective of Data 159
Scales of Measurement 160
Nominal Scale of Measurement 160
Ordinal Scale of Measurement 161
Interval Scale of Measurement 161
Ratio Scale of Measurement 162
Feature Engineering 163
Dealing with Missing Data 164
Handling Categorical Data 164
Normalizing Data 166
Feature Construction or Generation 168
Exploratory Data Analysis 169
Univariate Analysis 170
Multivariate Analysis 172
Correlation Matrix 173
Pair Plot 174
Findings from EDA 175
Supervised Learning–Regression 177
Correlation and Causation 179
Fitting a Slope 180
How Good Is Your Model? 182
R-Squared for Goodness of fit 182
Root Mean Squared Error 184
Mean Absolute Error 184
Outliers 185
Polynomial Regression 187
Multivariate Regression 193
Multicollinearity and Variation Inflation Factor 194
Interpreting the Ordinary Least Squares (OLS) Regression Results 199
Regression Diagnostics 204
Outliers 204
Homoscedasticity and Normality 205
Overfitting and Underfitting 208
Regularization 208
Nonlinear Regression 212
Supervised Learning–Classification 213
Logistic Regression 214
Evaluating a Classification Model Performance 219
ROC Curve 221
Fitting Line 222
Stochastic Gradient Descent 224
Regularization 225
Multiclass Logistic Regression 228
Load Data 228
Normalize Data 229
Split Data 229
Training Logistic Regression Model and Evaluating 229
Generalized Linear Models 231
Supervised Learning–Process Flow 233
Decision Trees 234
How the Tree Splits and Grows 236
Conditions for Stopping Partitioning 236
Key Parameters for Stopping Tree Growth 239
Support Vector Machine 240
Key Parameters 240
k-Nearest Neighbors 244
Time-Series Forecasting 247
Components of Time Series 247
Autoregressive Integrated Moving Average (ARIMA) 248
Running ARIMA Model 248
Checking for Stationary 250
Autocorrelation Test 252
Build Model and Evaluate 253
Predicting Future Values 257
Unsupervised Learning Process Flow 258
Clustering 259
K-means 259
Limitations of K-means 260
Finding the Value of k 264
Elbow Method 264
Average Silhouette Method 266
Hierarchical Clustering 268
Key parameters 268
Principal Component Analysis (PCA) 271
Summary 275
Chapter 4: Step 4: Model Diagnosis and Tuning 277
Optimal Probability Cutoff Point 278
Which Error Is Costly? 282
Rare Event or Imbalanced Dataset 282
Which Resampling Technique Is the Best? 286
Bias and Variance 288
Bias 288
Variance 288
K-Fold Cross Validation 290
Stratified K-fold Cross-Validation 291
Ensemble Methods 294
Bagging 295
Feature Importance 298
RandomForest 299
Extremely Randomized Trees (ExtraTree) 299
How Does the Decision Boundary Look? 300
Bagging—Essential Tuning Parameters 303
Boosting 303
Example Illustration for AdaBoost 304
Boosting Iteration 1 305
Boosting Iteration 2 305
Boosting Iteration 3 306
Final Model 306
Gradient Boosting 309
Boosting—Essential Tuning Parameters 312
Xgboost (eXtreme Gradient Boosting) 313
Ensemble Voting—Machine Learning’s Biggest Heroes United 318
Hard Voting vs. Soft Voting 321
Stacking 322
Hyperparameter Tuning 326
GridSearch 326
RandomSearch 328
Bayesian Optimization 330
Noise Reduction for Time-Series IoT Data 333
Summary 336
Chapter 5: Step 5: Text Mining and Recommender Systems 338
Text Mining Process Overview 339
Data Assemble (Text) 340
Social Media 342
Data Preprocessing (Text) 347
Convert to Lower Case and Tokenize 347
Sentence Tokenizing 347
Word Tokenizing 348
Removing Noise 349
Part of Speech (PoS) Tagging 351
Stemming 353
Lemmatization 355
N-grams 358
Bag of Words 360
Term Frequency-Inverse Document Frequency (TF-IDF) 363
Data Exploration (Text) 364
Frequency Chart 365
Word Cloud 366
Lexical Dispersion Plot 367
Cooccurrence Matrix 368
Model Building 369
Text Similarity 370
Text Clustering 372
Latent Semantic Analysis (LSA) 373
Topic Modeling 377
Latent Dirichlet Allocation 377
Nonnegative Matrix Factorization 379
Text Classification 380
Sentiment Analysis 382
Deep Natural Language Processing (DNLP) 384
Word2Vec 386
Recommender Systems 388
Content-Based Filtering 389
Collaborative Filtering (CF) 390
Summary 394
Chapter 6: Step 6: Deep and Reinforcement Learning 395
Artificial Neural Network (ANN) 397
What Goes On Behind, When Computers Look at an Image? 398
Why Not a simple Classification Model for Images? 399
Perceptron—Single Artificial Neuron 399
Multilayer Perceptrons (Feedforward Neural Network) 402
Load MNIST Data 404
Key Parameters for Scikit-learn MLP 405
Restricted Boltzman Machines (RBMs) 408
MLP Using Keras 414
Autoencoders 419
Dimension Reduction Using an Autoencoder 420
Denoise Image Using an Autoencoder 425
Convolutional Neural Network (CNN) 426
CNN on MNIST Dataset 435
Visualization of Layers 438
Recurrent Neural Network (RNN) 440
Long Short Term Memory (LSTM) 441
Transfer Learning 445
Reinforcement Learning 450
Summary 454
Chapter 7: Conclusion 455
Tips 457
Start with Questions/Hypothesis, Then Move to Data! 457
Don’t Reinvent the Wheel from Scratch 458
Start with Simple Models 459
Focus on Feature Engineering 459
Beware of Common ML Imposters 460
Happy Machine Learning 460
Index 461
Erscheint lt. Verlag | 1.10.2019 |
---|---|
Zusatzinfo | XVII, 457 p. 185 illus., 1 illus. in color. |
Sprache | englisch |
Themenwelt | Mathematik / Informatik ► Informatik ► Datenbanken |
Mathematik / Informatik ► Informatik ► Programmiersprachen / -werkzeuge | |
Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik | |
Schlagworte | Deep learning • machine learning • Model Tuning • Neural networks • Python • recommendation system • Reinforcement Learning • scikit-learn • Text Mining |
ISBN-10 | 1-4842-4947-X / 148424947X |
ISBN-13 | 978-1-4842-4947-5 / 9781484249475 |
Informationen gemäß Produktsicherheitsverordnung (GPSR) | |
Haben Sie eine Frage zum Produkt? |
Größe: 12,6 MB
DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.
Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich