Machine Learning and Data Mining for Computer Security (eBook)
XVI, 210 Seiten
Springer London (Verlag)
978-1-84628-253-9 (ISBN)
'Machine Learning and Data Mining for Computer Security' provides an overview of the current state of research in machine learning and data mining as it applies to problems in computer security. This book has a strong focus on information processing and combines and extends results from computer security.
The first part of the book surveys the data sources, the learning and mining methods, evaluation methodologies, and past work relevant for computer security. The second part of the book consists of articles written by the top researchers working in this area. These articles deals with topics of host-based intrusion detection through the analysis of audit trails, of command sequences and of system calls as well as network intrusion detection through the analysis of TCP packets and the detection of malicious executables.
This book fills the great need for a book that collects and frames work on developing and applying methods from machine learning and data mining to problems in computer security.
"e;Machine Learning and Data Mining for Computer Security"e; provides an overview of the current state of research in machine learning and data mining as it applies to problems in computer security. This book has a strong focus on information processing and combines and extends results from computer security. The first part of the book surveys the data sources, the learning and mining methods, evaluation methodologies, and past work relevant for computer security. The second part of the book consists of articles written by the top researchers working in this area. These articles deals with topics of host-based intrusion detection through the analysis of audit trails, of command sequences and of system calls as well as network intrusion detection through the analysis of TCP packets and the detection of malicious executables.This book fills the great need for a book that collects and frames work on developing and applying methods from machine learning and data mining to problems in computer security.
Foreword 7
Preface 9
List of Contributors 13
Contents 15
1 Introduction 17
Part I Survey Contributions 21
2 An Introduction to Information Assurance 23
2.1 Introduction 23
2.2 The Security Process 24
2.2.1 Protection 24
2.2.2 Detection 24
2.2.3 Response 25
2.3 Information Assurance 26
2.3.1 Security Properties 26
2.3.2 Information Location 30
2.3.3 System Processes 31
2.4 Attackers and the Threats Posed 32
2.4.1 Worker with a Backhoe 33
2.4.2 Ignorant Users 33
2.4.3 Criminals 33
2.4.4 Script Kiddies 34
2.4.5 Automated Agents 34
2.4.6 Professional System Crackers 35
2.4.7 Insiders 35
2.5 Opportunities for Machine Learning Approaches 36
2.6 Conclusion 37
3 Some Basic Concepts of Machine Learning and Data Mining 39
3.1 Introduction 39
3.2 From Data to Examples 40
3.3 Representations, Models, and Algorithms 43
3.3.1 Instance-Based Learning 45
3.3.2 Naive Bayes 45
3.3.3 Kernel Density Estimation 45
3.3.4 Learning Coe.cients of a Linear Function 46
3.3.5 Learning Decision Rules 46
3.3.6 Learning Decision Trees 47
3.3.7 Mining Association Rules 47
3.4 Evaluating Models 48
3.4.1 Problems with Simple Performance Measures 51
3.4.2 ROC Analysis 52
3.4.3 Principled Evaluations and Their Importance 54
3.5 Ensemble Methods and Sequence Learning 55
3.5.1 Ensemble Methods 56
3.5.2 Sequence Learning 56
3.6 Implementations and Data Sets 58
3.7 Further Reading 58
3.8 Concluding Remarks 59
Part II Research Contributions 61
4 Learning to Detect Malicious Executables 63
4.1 Introduction 63
4.2 Related Work 65
4.3 Data Collection 68
4.4 Classification Methodology 68
4.4.1 Instance-Based Learner 69
4.4.2 The TFIDF Classi.er 69
4.4.3 Naive Bayes 70
4.4.4 Support Vector Machines 70
4.4.5 Decision Trees 71
4.4.6 Boosted Classi.ers 71
4.5 Experimental Design 72
4.6 Experimental Results 72
4.6.1 Pilot Studies 72
4.6.2 Experiment with a Small Collection 73
4.6.3 Experiment with a Larger Collection 73
4.7 Discussion 76
4.8 Concluding Remarks 79
5 Data Mining Applied to Intrusion Detection: MITRE Experiences 81
5.1 Introduction 81
5.1.1 Related Work 82
5.1.2 MITRE Intrusion Detection 83
5.2 Initial Feature Selection, Aggregation, Classification, and Ranking 84
5.2.1 Feature Selection and Aggregation 85
5.2.2 HOMER 86
5.2.3 BART Algorithm and Implementation 86
5.2.4 Other Anomaly Detection Efforts 89
5.3 Classifier to Reduce False Alarms 90
5.3.1 Incremental Classifier Algorithm 90
5.3.2 Classifier Experiments 92
5.4 Clustering to Detect Anomalies 94
5.4.1 Clustering with a Reference Model on KDD Cup Data 95
5.4.2 Clustering without a Reference Model on MITRE Data 97
5.5 Conclusion 97
6 Intrusion Detection Alarm Clustering 105
6.1 Introduction 105
6.2 Root Causes and Root Cause Analysis 106
6.3 The CLARAty Alarm Clustering Method 108
6.3.1 Motivation 108
6.3.2 The CLARAty Algorithm 109
6.3.3 CLARAty Use Case 111
6.4 Cluster Validation 112
6.4.1 The Validation Dilemma 112
6.4.2 Cluster Validation in Brief 113
6.4.3 Validation of Alarm Clusters 115
6.5 Cluster Tendency 116
6.5.1 Test of Cluster Tendency 116
6.5.2 Experimental Setup and Results 119
6.5.3 Derivation of Probabilities 120
6.6 Conclusion 122
7 Behavioral Features for Network Anomaly Detection 123
7.1 Introduction 123
7.2 Inter-Flow versus Intra-Flow Analysis 125
7.3 Operationally Variable Attributes 127
7.3.1 Size of Normal Value Space 127
7.3.2 Data Mining on Operationally Variable Attributes 128
7.4 Deriving Behavioral Features 130
7.5 Authentication Using Behavioral Features 131
7.5.1 The Need for Authentication of Server Flows 131
7.5.2 Classification of Server Flows 132
7.5.3 An Empirical Evaluation 133
7.5.4 Aggregate Server Flow Model 133
7.5.5 Host-Speci.c Models 135
7.5.6 Models from Real Network Trafic 136
7.5.7 Classification for Intrusion and Misuse Detection 137
7.6 Related Work 138
7.7 Conclusion 140
8 Cost-Sensitive Modeling for Intrusion Detection 141
8.1 Introduction 141
8.2 Cost Factors, Models, and Metrics in IDSs 142
8.2.1 Cost Factors 142
8.2.2 Cost Models 142
8.2.3 Cost Metrics 143
8.3 Cost-Sensitive Modeling 144
8.3.1 Reducing Operational Cost 144
8.3.2 Reducing Consequential Cost 146
8.4 Experiments 146
8.4.1 Design 146
8.4.2 Measurements 147
8.4.3 Results 147
8.4.4 Comparison with fcs-RIPPER 151
8.5 Related Work 151
8.6 Conclusion and Future Work 151
9 Data Cleaning and Enriched Representations for Anomaly Detection in System Calls 153
9.1 Introduction 153
9.2 Related Work 155
9.3 Data Cleaning 156
9.3.1 Representation with Motifs and Their Locations 156
9.3.2 Unsupervised Training with Local Outlier Factor (LOF) 160
Automating the Parameters 161
9.4 Anomaly Detection 163
9.4.1 Representation with Arguments 163
9.4.2 Supervised Training with LERAD 164
9.5 Experimental Evaluations 167
9.5.1 Data Cleaning Evaluation Procedures and Criteria 168
9.5.2 Anomaly Detection with Arguments Evaluation Procedures and Criteria 169
9.5.3 Anomaly Detection with Cleaned Data vs. Raw Data Evaluation Procedures and Criteria 171
9.6 Concluding Remarks 171
10 A Decision-Theoretic, Semi-Supervised Model for Intrusion Detection 173
10.1 Introduction 173
10.2 Related Work 176
10.3 A New Model of Intrusion Detection 176
10.3.1 Generative Data Model 177
10.3.2 Inference and Learning 178
10.3.3 Action Selection 182
10.3.4 Relaxing the Cost Function 183
10.4 Experiments 187
10.4.1 Data Set 188
10.4.2 Results 189
10.5 Conclusions and Future Work 192
References 195
Index 215
2 An Introduction to Information Assurance (p. 7)
Clay Shields
2.1 Introduction
The intuitive function of computer security is to limit access to a computer system. With a perfect security system, information would never be compromised because unauthorized users would never gain access to the system. Unfortunately, it seems beyond our current abilities to build a system that is both perfectly secure and useful.
Instead, the security of information is often compromised through technical flaws and through user actions. The realization that we cannot build a perfect system is important, because it shows that we need more than just protection mechanisms. We should expect the system to fail, and be prepared for failures.
As described in Sect. 2.2, system designers not only use mechanisms that protect against policy violations, but also detect when violations occur, and respond to the violation. This response often includes analyzing why the protection mechanisms failed and improving them to prevent future failures.
It is also important to realize that security systems do not exist just to limit access to a system. The true goal of implementing security is to protect the information on the system, which can be far more valuable than the system itself or access to its computing resources.
Because systems involve human users, protecting information requires more than just technical measures. It also requires that the users be aware of and follow security policies that support protection of information as needed.
This chapter provides a wider view of information security, with the goal of giving machine learning researchers and practitioners an overview of the area and suggesting new areas that might benefit from machine learning approaches. This wider view of security is called information assurance.
It includes the technical aspects of protecting information, as well as defining policies thoroughly and correctly and ensuring proper behavior of human users and operators. I will first describe the security process.
I will then explain the standard model of information assurance and its components, and, finally, will describe common attackers and the threats they pose. I will conclude with some examples of problems that fall outside much of the normal technical considerations of computer security that may be amenable to solution by machine learning methods.
2.2 The Security Process
Human beings are inherently fallible. Because we will make mistakes, our security process must reflect that fact and attempt to account for it. This recognition leads to the cycle of security shown in Fig. 2.1. This cycle is really very familiar and intuitive, and is common in everyday life, and is illustrated here with a running example of securing an automobile.
2.2.1 Protection
Protection mechanisms are used to enforce a particular policy. The goal is to prevent things that are undesirable from occurring. A familiar example is securing an automobile and its contents. A car comes with locks to prevent anyone without a key from gaining access to it, or from starting it without the key. These locks constitute the car’s protection mechanisms.
2.2.2 Detection
Since we anticipate that our protection mechanisms will be imperfect, we attempt to determine when that occurs by adding detection mechanisms.
Erscheint lt. Verlag | 27.2.2006 |
---|---|
Reihe/Serie | Advanced Information and Knowledge Processing | Advanced Information and Knowledge Processing |
Zusatzinfo | XVI, 210 p. |
Verlagsort | London |
Sprache | englisch |
Themenwelt | Informatik ► Datenbanken ► Data Warehouse / Data Mining |
Informatik ► Netzwerke ► Sicherheit / Firewall | |
Informatik ► Theorie / Studium ► Kryptologie | |
Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik | |
Mathematik / Informatik ► Informatik ► Web / Internet | |
Schlagworte | Anomaly Detection • Audit trail analysis • Behavior • Clustering • Computer • Computer forensics • Computer Security • data cleansing • Data Mining • Frames • Intrusion Detection • learning • machine learning • Modeling • security |
ISBN-10 | 1-84628-253-5 / 1846282535 |
ISBN-13 | 978-1-84628-253-9 / 9781846282539 |
Haben Sie eine Frage zum Produkt? |
Größe: 1,6 MB
DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich