Anomaly Detection Principles and Algorithms (eBook)
XXII, 217 Seiten
Springer International Publishing (Verlag)
978-3-319-67526-8 (ISBN)
This book provides a readable and elegant presentation of the principles of anomaly detection,providing an easy introduction for newcomers to the field. A large number of algorithms are succinctly described, along with a presentation of their strengths and weaknesses.
The authors also cover algorithms that address different kinds of problems of interest with single and multiple time series data and multi-dimensional data. New ensemble anomaly detection algorithms are described, utilizing the benefits provided by diverse algorithms, each of which work well on some kinds of data.
With advancements in technology and the extensive use of the internet as a medium for communications and commerce, there has been a tremendous increase in the threats faced by individuals and organizations from attackers and criminal entities. Variations in the observable behaviors of individuals (from others and from their own past behaviors) have been found to be useful in predicting potential problems of various kinds. Hence computer scientists and statisticians have been conducting research on automatically identifying anomalies in large datasets.
This book will primarily target practitioners and researchers who are newcomers to the area of modern anomaly detection techniques. Advanced-level students in computer science will also find this book helpful with their studies.
Preface 7
Contents 9
List of Figures 14
List of Tables 19
Part I Principles 21
1 Introduction 22
1.1 What's an Anomaly? 23
1.2 Cybersecurity 26
1.2.1 Privacy 26
1.2.2 Malware Detection 27
1.2.3 Fraudulent Email 27
1.3 Finance 28
1.3.1 Credit Card Fraud 28
1.3.2 Creditworthiness 29
1.3.3 Bankruptcy Prediction 29
1.3.4 Investing 29
1.4 Healthcare 30
1.4.1 Diagnosis 30
1.4.2 Patient Monitoring 31
1.4.3 Radiology 31
1.4.4 Epidemiology 31
1.5 Defense and Internal Security 31
1.5.1 Personnel Behaviors 32
1.5.2 Battlefield Behaviors 32
1.5.3 Unconventional Attacks 32
1.6 Consumer Home Safety 33
1.6.1 Detecting Occurrence of Falls and Other Problems 33
1.6.2 Home Perimeter Safety 34
1.6.3 Indoor Pollution Monitoring 34
1.7 Manufacturing and Industry 35
1.7.1 Quality Control 35
1.7.2 Retail Sales 35
1.7.3 Inventory Management 36
1.7.4 Customer Behavior 36
1.7.5 Employee Behavior 36
1.8 Science 37
1.9 Conclusion 38
2 Anomaly Detection 39
2.1 Anomalies 39
2.1.1 Metrics for Measurement 41
2.1.2 Old Problems vs. New Problems 42
2.1.3 What Kind of Data? 42
2.1.4 What's a Norm? 43
2.2 Outliers in One-Dimensional Data 44
2.3 Outliers in Multidimensional Data 46
2.4 Anomaly Detection Approaches 47
2.5 Evaluation Criteria 48
2.6 Conclusion 50
3 Distance-Based Anomaly Detection Approaches 51
3.1 Introduction 51
3.2 Similarity Measures 52
3.3 Distance-Based Approaches 54
3.3.1 Distance to All Points 54
3.3.2 Distance to Nearest Neighbor 55
3.3.3 Average Distance to k Nearest Neighbors 55
3.3.4 Median Distance to k Nearest Neighbors 56
3.4 Conclusion 57
4 Clustering-Based Anomaly Detection Approaches 58
4.1 Identifying Clusters 58
4.1.1 Nearest Neighbor Clustering 59
4.1.2 k-Means Clustering 60
4.1.3 Fuzzy Clustering 62
4.1.4 Agglomerative Clustering 63
4.1.5 Density-Based Agglomerative Clustering 64
4.1.6 Divisive Clustering 65
4.2 Anomaly Detection Using Clusters 66
4.2.1 Cluster Membership or Size 66
4.2.2 Proximity to Other Points 67
4.2.3 Proximity to Nearest Neighbor 68
4.2.4 Boundary Distance 68
4.2.5 When Cluster Sizes Differ 70
4.2.6 Distances from Multiple Points 71
4.3 Conclusion 72
5 Model-Based Anomaly Detection Approaches 73
5.1 Models of Relationships Between Variables 73
5.1.1 Model Parameter Space Approach 74
5.1.2 Data Space Approach 75
5.1.2.1 Implicit Model 75
5.1.2.2 Explicit Models 77
5.2 Distribution Models 79
5.2.1 Parametric Distribution Estimation 79
5.2.2 Regression Models 80
5.2.2.1 Linear Regression 81
5.2.2.2 Nonlinear Regression 81
5.2.2.3 Kernel Regression and Support Vector Machines 82
5.2.2.4 Splines 83
5.3 Models of Time-Varying Processes 83
5.3.1 Markov Models 86
5.3.2 Time Series Models 88
5.3.2.1 ARIMA 90
5.3.2.2 DFT 92
5.3.2.3 Haar 92
5.4 Anomaly Detection in Time Series 94
5.4.1 Anomaly Within a Single Time Series 95
5.4.1.1 Methodologies for Anomaly Detection Within a Single Time Series 99
5.4.2 Anomaly Detection Among Multiple Time Series 100
5.4.2.1 Using Point-to-Point Distances 103
5.4.2.2 Using Variations over Time 103
5.4.2.3 Correlations with Delays 106
5.5 Learning Algorithms Used to Derive Models from Data 107
5.5.1 Regularization 108
5.6 Conclusion 109
Part II Algorithms 111
6 Distance and Density Based Approaches 112
6.1 Distance from the Rest of the Data 112
6.1.1 Distance Based-Outlier Approach 115
6.2 Local Correlation Integral (LOCI) Algorithm 117
6.2.1 Resolution-Based Outlier Detection 119
6.3 Nearest Neighbor Approach 120
6.4 Density Based Approaches 122
6.4.1 Mixture Density Estimation 124
6.4.2 Local Outlier Factor (LOF) Algorithm 125
6.4.3 Connectivity-Based Outlier Factor (COF) Approach 127
6.4.4 INFLuential Measure of Outlierness by Symmetric Relationship (INFLO) 129
6.5 Performance Comparisons 131
6.6 Conclusions 132
7 Rank Based Approaches 133
7.1 Rank-Based Detection Algorithm (RBDA) 135
7.1.1 Why Does RBDA Work? 136
7.2 Anomaly Detection Algorithms Based on Clustering and Weighted Ranks 138
7.2.1 NC-Clustering 139
7.2.2 Density and Rank Based Detection Algorithms 140
7.3 New Algorithms Based on Distance and Cluster Density 141
7.4 Results 144
7.4.1 RBDA Versus the Kernel Based Density Estimation Algorithm 144
7.4.2 Comparison of RBDA and Its Extensions with LOF, COF, and INFLO 145
7.4.3 Comparison for KDD99 and Packed Executables Datasets 147
7.5 Conclusions 148
8 Ensemble Methods 149
8.1 Independent Ensemble Methods 149
8.2 Sequential Application of Algorithms 153
8.3 Ensemble Anomaly Detection with Adaptive Sampling 154
8.3.1 AdaBoost 155
8.3.2 Adaptive Sampling 156
8.3.3 Minimum Margin Approach 156
8.4 Weighted Adaptive Sampling 157
8.4.1 Weighted Adaptive Sampling Algorithm 161
8.4.2 Comparative Results 161
8.4.3 Dataset Description 162
8.4.4 Performance Comparisons 162
8.4.5 Effect of Model Parameters 164
8.5 Conclusion 166
9 Algorithms for Time Series Data 167
9.1 Problem Definition 168
9.2 Identification of an Anomalous Time Series 171
9.2.1 Algorithm Categories 172
9.2.2 Distances and Transformations 173
9.3 Abnormal Subsequence Detection 181
9.4 Outlier Detection Based on Multiple Measures 183
9.4.1 Measure Selection 183
9.4.2 Identification of Anomalous Series 186
9.5 Online Anomaly Detection for Time Series 186
9.5.1 Online Updating of Distance Measures 187
9.5.2 Multiple Measure Based Abnormal Subsequence Detection Algorithm (MUASD) 190
9.5.3 Finding Nearest Neighbor by Early Abandoning 192
9.5.4 Finding Abnormal Subsequence Based on Ratio of Frequencies (SAXFR) 193
9.5.4.1 Effect of SAXFR Subsequence Length Parameter 194
9.5.5 MUASD Algorithm 195
9.6 Experimental Results 196
9.6.1 Detection of Anomalous Series in a Dataset 196
9.6.2 Online Anomaly Detection 197
9.6.3 Anomalous Subsequence Detection 200
9.6.4 Computational Effort 202
9.7 Conclusion 202
Appendix A Datasets for Evaluation 204
A.1 A Datasets for Evaluation 204
A.2 Real Datasets 204
A.3 KDD and PED 207
Appendix B Datasets for Time Series Experiments 208
B.1 Datasets 208
B.1.1 Synthetic Datasets 208
B.1.2 Brief Description of Datasets 208
B.1.2.1 Real Datasets 209
B.1.3 Datasets for Online Anomalous Time Series Detection 215
B.1.4 Data Sets for Abnormal Subsequence Detection in a Single Series 216
B.1.5 Results for Abnormal Subsequence Detection in a Single Series for Various Datasets 216
References 221
Index 227
Erscheint lt. Verlag | 18.11.2017 |
---|---|
Reihe/Serie | Terrorism, Security, and Computation | Terrorism, Security, and Computation |
Zusatzinfo | XXII, 217 p. 66 illus., 55 illus. in color. |
Verlagsort | Cham |
Sprache | englisch |
Themenwelt | Informatik ► Netzwerke ► Sicherheit / Firewall |
Schlagworte | algorithms • Anomaly Detection • classification • Clustering • Data Mining • Ensemble methods • machine learning • Outlier Detection • Rank Based Approach • security applications • statistical pattern recognition • Time Series • Time Series Anomaly Detection |
ISBN-10 | 3-319-67526-5 / 3319675265 |
ISBN-13 | 978-3-319-67526-8 / 9783319675268 |
Haben Sie eine Frage zum Produkt? |
Größe: 4,2 MB
DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.
Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich