An Introduction to Machine Learning - Gopinath Rebala, Ajay Ravi, Sanjay Churiwala

Blick ins Buch

An Introduction to Machine Learning (eBook)

Gopinath Rebala, Ajay Ravi, Sanjay Churiwala (Autoren)

eBook Download: PDF

2019 | 1. Auflage
XXII, 275 Seiten
Springer-Verlag
978-3-030-15729-6 (ISBN)

Lese- und Medienproben

Ebook-Leseprobe (PDF)

Just like electricity, Machine Learning will revolutionize our life in many ways - some of which are not even conceivable today. This book provides a thorough conceptual understanding of Machine Learning techniques and algorithms. Many of the mathematical concepts are explained in an intuitive manner. The book starts with an overview of machine learning and the underlying Mathematical and Statistical concepts before moving onto machine learning topics. It gradually builds up the depth, covering many of the present day machine learning algorithms, ending in Deep Learning and Reinforcement Learning algorithms. The book also covers some of the popular Machine Learning applications. The material in this book is agnostic to any specific programming language or hardware so that readers can try these concepts on whichever platforms they are already familiar with.

Offers a comprehensive introduction to Machine Learning, while not assuming any prior knowledge of the topic;
Provides a complete overview of available techniques and algorithms in conceptual terms, covering various application domains of machine learning;
Not tied to any specific software language or hardware implementation.

Gopinath Rebala is Chief Technical Officer at OpsMx, Inc. in San Ramon, California.

Ajay Ravi is a Software Engineer at Intel, Inc. in San Jose, California.

Sanjay Churiwala is Senior Director at Xilinx, in Hyderabad, India.

Preface 5
Reading the Book 6
Acknowledgments 7
Contents 8
List of Figures 16
List of Tables 19
Chapter 1: Machine Learning Definition and Basics 21
1.1 Introduction 21
1.1.1 Resurgence of ML 22
1.1.2 Relation with Artificial Intelligence (AI) 23
1.1.3 Machine Learning Problems 24
1.2 Matrices 24
1.2.1 Vector and Tensors 25
1.2.2 Matrix Addition (or Subtraction) 25
1.2.3 Matrix Transpose 25
1.2.4 Matrix Multiplication 26
1.2.4.1 Multiplying with a Scalar 26
1.2.4.2 Multiplying with Another Matrix 26
1.2.4.3 Multiplying with a Vector 27
1.2.5 Identity Matrix 27
1.2.6 Matrix Inversion 27
1.2.7 Solving Equations Using Matrices 28
1.3 Numerical Methods 29
1.4 Probability and Statistics 30
1.4.1 Sampling the Distribution 31
1.4.2 Random Variables 31
1.4.3 Expectation 31
1.4.4 Conditional Probability and Distribution 32
1.4.5 Maximum Likelihood 32
1.5 Linear Algebra 33
1.6 Differential Calculus 34
1.6.1 Functions 34
1.6.2 Slope 34
1.7 Computer Architecture 35
1.8 Next Steps 36
Chapter 2: Learning Models 38
2.1 Supervised Learning 38
2.1.1 Classification Problem 39
2.1.2 Regression Problem 39
2.2 Unsupervised Learning 40
2.3 Semi-supervised Learning 41
2.4 Reinforcement Learning 41
Chapter 3: Regressions 43
3.1 Introduction 43
3.2 The Model 44
3.3 Problem Formulation 44
3.4 Linear Regression 45
3.4.1 Normal Method 46
3.4.2 Gradient Descent Method 48
3.4.2.1 Determine the Slope at Any Given Point 49
3.4.2.2 Initial Value 49
3.4.2.3 Correction 50
3.4.2.4 Learning Rate 50
3.4.2.5 Convergence 52
3.4.2.6 Alternate Method for Computing Slope 53
3.4.2.7 Putting Gradient Descent in Practice 53
3.4.3 Normal Equation Method vs Gradient Descent Method 54
3.5 Logistic Regression 54
3.5.1 Sigmoid Function 55
3.5.2 Cost Function 56
3.5.3 Gradient Descent 56
3.6 Next Steps 57
3.7 Key Takeaways 58
Chapter 4: Improving Further 59
4.1 Nonlinear Contribution 59
4.2 Feature Scaling 60
4.3 Gradient Descent Algorithm Variations 61
4.3.1 Cost Contour 61
4.3.2 Stochastic Gradient Descent 62
4.3.2.1 Convergence for Stochastic Gradient Descent 64
4.3.3 Mini Batch Gradient Descent 65
4.3.4 Map Reduce and Parallelism 66
4.3.5 Basic Theme of Algorithm Variations 67
4.4 Regularization 67
4.4.1 Regularization for Normal Equation 70
4.4.2 Regularization for Logistic Regression 70
4.4.3 Determining Appropriate ? 70
4.4.3.1 Cross Validation 71
4.4.3.2 K-Fold Cross Validation 71
4.4.4 Comparing Hypothesis 71
4.5 Multi-class Classifications 72
4.5.1 One-vs-All Classification 72
4.5.2 SoftMax 73
4.5.2.1 Basic Approach for SoftMax 73
4.5.2.2 Loss Function 74
4.6 Key Takeaways and Next Steps 74
Chapter 5: Classification 75
5.1 Decision Boundary 75
5.1.1 Nonlinear Decision Boundary 76
5.2 Skewed Class 78
5.2.1 Optimizing Precision vs Recall 79
5.2.2 Single Metric 79
5.3 Naïve Bayes´ Algorithm 80
5.4 Support Vector Machines 81
5.4.1 Kernel Selection 84
Chapter 6: Clustering 85
6.1 K-Means 85
6.1.1 Basic Algorithm 86
6.1.2 Distance Calculation 86
6.1.3 Algorithm Pseudo Code 87
6.1.4 Cost Function 87
6.1.5 Choice of Initial Random Centers 88
6.1.6 Number of Clusters 89
6.2 K-Nearest Neighbor (KNN) 90
6.2.1 Weight Consideration 91
6.2.2 Feature Scaling 91
6.2.3 Limitations 91
6.2.4 Finding the Nearest Neighbors 92
6.3 Next Steps 94
Chapter 7: Random Forests 95
7.1 Decision Tree 95
7.2 Information Gain 97
7.3 Gini Impurity Criterion 104
7.4 Disadvantages of Decision Trees 107
7.5 Random Forests 107
7.5.1 Data Bagging 108
7.5.2 Feature Bagging 108
7.5.3 Cross Validation in Random Forests 108
7.5.4 Prediction 109
7.6 Variable Importance 109
7.7 Proximities 110
7.7.1 Outliers 110
7.7.2 Prototypes 111
7.8 Disadvantages of Random Forests 111
7.9 Next Steps 112
Chapter 8: Testing the Algorithm and the Network 113
8.1 Test Set 113
8.2 Overfit 114
8.3 Underfit 114
8.4 Determining the Number of Degrees 114
8.5 Determining ? 115
8.6 Increasing Data Count 116
8.6.1 High Bias Case 116
8.6.2 High Variance Case 117
8.7 The Underlying Mathematics (Optional) 117
8.8 Utilizing the Bias vs Variance Information 119
8.9 Derived Data 119
8.10 Approach 120
8.11 Test Data 120
Chapter 9: (Artificial) Neural Networks 121
9.1 Logistic Regression Extended to Form Neural Network 121
9.2 Neural Network as Oversimplified Brain 123
9.3 Visualizing Neural Network Equations 124
9.4 Matrix Formulation of Neural Network 125
9.5 Neural Network Representation 126
9.6 Starting to Design a Neural Network 127
9.7 Training the Network 128
9.7.1 Chain Rule 129
9.7.2 Components of Gradient Computation 130
9.7.3 Gradient Computation Through Backpropagation 132
9.7.4 Updating Weights 133
9.8 Vectorization 134
9.9 Controlling Computations 134
9.10 Next Steps 134
Chapter 10: Natural Language Processing 135
10.1 Complexity of NLP 135
10.2 Algorithms 137
10.2.1 Rule-Based Processing 137
10.2.2 Tokenizer 137
10.2.3 Named Entity Recognizers 138
10.2.4 Term Frequency-Inverse Document Frequency (tf-idf) 139
10.2.5 Word Embedding 140
10.2.6 Word2vec 141
10.2.6.1 Continuous Bag of Words 141
10.2.6.2 Skip-Gram Model 142
Chapter 11: Deep Learning 144
11.1 Recurrent Neural Networks 144
11.1.1 Representation of RNN 146
11.1.2 Backpropagation in RNN 149
11.1.3 Vanishing Gradients 150
11.2 LSTM 151
11.3 GRU 153
11.4 Self-Organizing Maps 155
11.4.1 Representation and Training of SOM 155
Chapter 12: Principal Component Analysis 158
12.1 Applications of PCA 159
12.1.1 Example 1 159
12.1.2 Example 2 159
12.2 Computing PCA 160
12.2.1 Data Representation 160
12.2.2 Covariance Matrix 160
12.2.3 Diagonal Matrix 161
12.2.4 Eigenvector 161
12.2.5 Symmetric Matrix 161
12.2.6 Deriving Principal Components 161
12.2.7 Singular Value Decomposition (SVD) 162
12.3 Computing PCA 162
12.3.1 Data Characteristics 162
12.3.2 Data Preprocessing 163
12.3.3 Selecting Principal Components 163
12.4 PCA Applications 165
12.4.1 Image Compression 165
12.4.2 Data Visualization 166
12.5 Pitfalls of PCA Application 168
12.5.1 Overfitting 168
12.5.2 Model Generation 169
12.5.3 Model Interpretation 169
Chapter 13: Anomaly Detection 170
13.1 Anomaly vs Classification 171
13.2 Model 171
13.2.1 Distribution Density 172
13.2.2 Estimating Distribution Parameters 173
13.2.3 Metric Value 173
13.2.4 Finding 174
13.2.5 Validating and Tuning the Model 174
13.3 Multivariate Gaussian Distribution 175
13.3.1 Determining Feature Mean 176
13.3.2 Determining Covariance 176
13.3.3 Computing and Applying the Metric 177
13.4 Anomalies in Time Series 177
13.4.1 Time Series Decomposition 178
13.4.2 Time Series Anomaly Types 178
13.4.3 Anomaly Detection in Time Series 180
13.4.3.1 ARIMA 181
13.4.3.2 Machine Learning Models 184
Chapter 14: Recommender Systems 185
14.1 Features Known 186
14.1.1 User´s Affinity Toward Each Feature 186
14.2 User´s Preferences Known 187
14.2.1 Characterizing Features 188
14.3 Features and User Preferences Both Unknown 189
14.3.1 Collaborative Filtering 189
14.3.1.1 Basic Assumptions 189
14.3.1.2 Parameters Under Consideration 189
14.3.1.3 Initialize 190
14.3.1.4 Iterate 190
14.3.1.5 Cost Function 190
14.3.1.6 Gradient Descent 191
14.3.2 Predicting and Recommending 191
14.4 New User 192
14.4.1 Shortcomings of the Current Algorithm 193
14.4.2 Mean Normalization 194
14.5 Tracking Changes in Preferences 194
Chapter 15: Convolution 196
15.1 Convolution Explained 196
15.2 Object Identification Example 198
15.2.1 Exact Shape Known 198
15.2.2 Exact Shape Not Known 199
15.2.3 Breaking Down Further 199
15.2.4 Unanswered Questions 200
15.3 Image Convolution 200
15.4 Preprocessing 202
15.5 Post-Processing 203
15.6 Stride 204
15.7 CNN 205
15.8 Matrix Operation 206
15.9 Refining the Filters 207
15.10 Pooling as Neural Network 208
15.11 Character Recognition and Road Signs 208
15.12 ADAS and Convolution 208
Chapter 16: Components of Reinforcement Learning 210
16.1 Key Participants of a Reinforcement Learning System 210
16.1.1 The Agent 210
16.1.1.1 Agent´s Objective 212
16.1.1.2 Rewards as Feedback for Agent 212
16.1.2 The Environment 213
16.1.2.1 Environment State Space 214
16.1.3 Interaction Between Agent and Environment 214
16.2 Environment State Transitions and Actions 215
16.2.1 Deterministic Environment 215
16.2.2 Stochastic Environment 216
16.2.3 Markov States and MDP 217
16.3 Agent´s Objective 218
16.4 Agent´s Behavior 219
16.5 Graphical Notation for a Trajectory 220
16.6 Value Function 220
16.6.1 State-Value Function 221
16.6.2 Action-Value Function 221
16.7 Methods for Finding Optimal Policies 223
16.7.1 Agent´s Awareness of MDP 223
16.7.1.1 MDP Known 223
16.7.1.2 MDP Unknown 224
16.7.1.3 MDP Partially Known 224
16.7.2 Model-Based and Model-Free Reinforcement Learning 225
16.7.3 On-Policy and Off-Policy Reinforcement Learning 225
16.8 Policy Iteration Method for Optimal Policy 225
16.8.1 Computing Q-function for a Given Policy 226
16.8.2 Policy Iteration 226
Chapter 17: Reinforcement Learning Algorithms 227
17.1 Monte Carlo Learning 227
17.1.1 State Value Estimation 228
17.1.2 Action Value Estimation 229
17.2 Estimating Action Values with TD Learning 229
17.3 Exploration vs Exploitation Trade-Off 231
17.3.1 -greedy Policy 231
17.4 Q-learning 232
17.5 Scaling Through Function Approximation 233
17.5.1 Approximating the Q-function in Q-learning 234
17.6 Policy-Based Methods 234
17.6.1 Advantages of Policy Gradient Methods 235
17.6.2 Parameterized Policy 235
17.6.3 Training the Model 236
17.6.4 Monte Carlo Gradient Methods 237
17.6.5 Actor-Critic Methods 237
17.6.6 Reducing Variability in Gradient Methods 238
17.7 Simulation-Based Learning 239
17.8 Monte Carlo Tree Search (MCTS) 241
17.8.1 Search Tree 241
17.8.2 Monte Carlo Search Tree 242
17.8.2.1 Trajectory Values 243
17.8.2.2 Backup Procedure 244
17.8.3 MCTS Algorithm 245
17.8.3.1 Selection Phase (aka Tree Phase) 245
17.8.3.2 Expansion Phase 245
17.8.3.3 Rollout Phase 246
17.8.3.4 Backup Phase (aka Back Propagation Phase) 246
17.8.3.5 Tree Policy 246
17.8.4 Pseudo Code for MCTS Algorithm 247
17.8.5 Parallel MCTS Algorithms 249
17.9 MCTS Tree Values for Two-Player Games 249
17.10 Alpha Zero 250
17.10.1 Overview 250
17.10.1.1 Value Function and Policy Network 250
17.10.1.2 MCTS Search 251
17.10.1.3 Self-Play and Training Data 251
17.10.1.4 Iterative Improvement Loop 251
17.10.2 Aspects of Alpha Zero 252
17.10.2.1 Supervised Training 252
17.10.2.2 Loss Function 253
17.10.3 MCTS Search 253
17.10.3.1 Node Value 253
17.10.3.2 Selection Phase 254
17.10.3.3 Expansion Phase 254
17.10.3.4 Evaluation Phase (Replaces the Rollout Phase) 255
17.10.3.5 Backup Phase 255
17.10.3.6 Parallel Execution 255
Chapter 18: Designing a Machine Learning System 256
18.1 Pipeline Systems 256
18.1.1 Ceiling Analysis 257
18.2 Data Quality 257
18.2.1 Unstructured Data 259
18.2.2 Getting Data 259
18.3 Improvisations over Gradient Descent 260
18.3.1 Momentum 260
18.3.2 RMSProp 261
18.3.3 ADAM (Adaptive Moment Estimation) 262
18.4 Software Stacks 263
18.4.1 TensorFlow 263
18.4.2 MXNet 264
18.4.3 pyTorch 265
18.4.4 The Microsoft Cognitive Toolkit 265
18.4.5 Keras 266
18.5 Choice of Hardware 266
18.5.1 Traditional Computer Systems 266
18.5.2 GPU 267
18.5.3 FPGAs 268
18.5.4 TPUs 268
Bibliography 270
Index 271

Erscheint lt. Verlag	7.5.2019
Zusatzinfo	XXII, 263 p. 83 illus., 77 illus. in color.
Sprache	englisch
Themenwelt	Technik ► Elektrotechnik / Energietechnik
Schlagworte	Big Data • Cloud Computing • Deep learning • Feature Search/Convolution • Natural Language Processing
ISBN-10	3-030-15729-6 / 3030157296
ISBN-13	978-3-030-15729-6 / 9783030157296

Haben Sie eine Frage zum Produkt?

PDF (Wasserzeichen)
Größe: 5,9 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Print-Ausgabe

Buch | Hardcover

139,09 €