Learning from Data Streams in Evolving Environments (eBook)

Methods and Applications

Moamar Sayed-Mouchaweh (Herausgeber)

eBook Download: PDF
2018 | 1st ed. 2019
VIII, 317 Seiten
Springer International Publishing (Verlag)
978-3-319-89803-2 (ISBN)

Lese- und Medienproben

Learning from Data Streams in Evolving Environments -
Systemvoraussetzungen
96,29 inkl. MwSt
  • Download sofort lieferbar
  • Zahlungsarten anzeigen

This edited book covers recent advances of techniques, methods and tools treating the problem of learning from data streams generated by evolving non-stationary processes. The goal is to discuss and overview the advanced techniques, methods and tools that are dedicated to manage, exploit and interpret data streams in non-stationary environments. The book includes the required notions, definitions, and background to understand the problem of learning from data streams in non-stationary environments and synthesizes the state-of-the-art in the domain, discussing advanced aspects and concepts and presenting open problems and future challenges in this field.

  • Provides multiple examples to facilitate the understanding data streams in non-stationary environments;
  • Presents several application cases to show how the methods solve different real world problems;
  • Discusses the links between methods to help stimulate new research and application directions.



Moamar Sayed-Mouchaweh received his PhD from the University of Reims-France. He was working as Associated Professor in Computer Science, Control and Signal processing at the University of Reims-France in the Research centre in Sciences and Technology of the Information and the Communication. In December 2008, he obtained the Habilitation to Direct Research (HDR) in Computer science, Control and Signal processing. Since September 2011, he is working as a Full Professor in the High National Engineering School of Mines Telecom Lille Douai (France), Department of Computer Science and Automatic Control. He edited and wrote several Springer books and served as a guest editor of several special issues of international journals. He also served as IPC Chair and conference Chair of several international workshops and conferences. He is serving as a member of the Editorial Board of several international Journals.

Moamar Sayed-Mouchaweh received his PhD from the University of Reims-France. He was working as Associated Professor in Computer Science, Control and Signal processing at the University of Reims-France in the Research centre in Sciences and Technology of the Information and the Communication. In December 2008, he obtained the Habilitation to Direct Research (HDR) in Computer science, Control and Signal processing. Since September 2011, he is working as a Full Professor in the High National Engineering School of Mines Telecom Lille Douai (France), Department of Computer Science and Automatic Control. He edited and wrote several Springer books and served as a guest editor of several special issues of international journals. He also served as IPC Chair and conference Chair of several international workshops and conferences. He is serving as a member of the Editorial Board of several international Journals.

Preface 6
Contents 8
Introduction 10
1 Learning from Data Streams 10
2 General Classification of Methods to Learn from Data Streams 12
3 Contents of This Book 13
3.1 Chapter 2 13
3.2 Chapter 3 14
3.3 Chapter 4 14
3.4 Chapter 5 15
3.5 Chapter 6 16
3.6 Chapter 7 16
3.7 Chapter 8 17
3.8 Chapter 9 17
3.9 Chapter 10 18
3.10 Chapter 11 19
3.11 Chapter 12 19
3.12 Chapter 13 20
References 20
Transfer Learning in Non-stationary Environments 22
1 Introduction 22
2 Transfer Learning (TL) 25
2.1 Transductive TL 26
2.2 Inductive TL 27
3 Learning in Non-stationary Environments (NSE) 29
3.1 Chunk-by-Chunk Approaches 31
3.2 Example-by-Example Approaches 32
4 The Relationship Between TL and Learning in NSE 34
4.1 Similarities 34
4.2 Differences 35
5 The Potential of Transfer Learning in NSE 37
5.1 Dynamic Cross-company Mapped Model Learning (Dycom) 39
5.2 Diversity for Dealing with Drifts (DDD) 41
6 Conclusions 43
References 44
A New Combination of Diversity Techniques in Ensemble Classifiers for Handling Complex Concept Drift 47
1 Introduction 48
2 Complex Concept Drift Characteristics and Challenges 49
2.1 Speed 49
2.2 Severity 50
2.3 Complex Concept Drift 51
3 Related Work 51
3.1 Block-Based Technique 52
3.2 Weighting-Data Technique 53
3.3 Filtering-Data Technique 53
4 The Proposed Approach 54
4.1 Drift Monitoring Process in EnsembleEDIST2 54
4.2 EnsembleEDIST2's Diversity by Variable-Sized Block Technique 56
4.3 EnsembleEDIST2's Diversity by New Filtering-Data Criterion 57
4.4 EnsembleEDIST2's Diversity by New Weighting-Data Process 58
5 Experimental Evaluation 61
5.1 Synthetic Datasets 61
5.2 Real Datasets 63
5.3 Evaluation Criteria 64
5.3.1 Parameter Settings 64
6 Comparative Study and Interpretation 64
6.1 Impact of N0 on EnsembleEDIST2 Performance 64
6.2 Impact of Ensemble Size on EnsembleEDIST2 Performance 65
6.3 Accuracy of EnsembleEDIST2 Vs Other Ensembles 65
7 Conclusion 67
References 68
Analyzing and Clustering Pareto-Optimal Objects in Data Streams 70
1 Introduction 70
2 Related Work 72
3 Background 74
3.1 Preference Constructors 74
3.2 PreferenceSQL 75
4 Preference-Based Stream Processing 76
4.1 The Preference-Based Stream Processing Framework 76
4.2 The Preference Continuous Query Language (PCQL) 77
4.3 The Stream-Based Lattice Skyline Algorithm (SLS) 79
4.3.1 Finding the BMO-Set of a Data Stream 79
4.3.2 The SLS Algorithm 80
5 Clustering of Pareto-Optimal Objects 82
5.1 Clustering Background 82
5.2 The Borda Social Choice Voting Rule for Clustering 83
5.2.1 The Borda Social Choice Voting Rule 84
5.2.2 Cluster Allocation 84
5.2.3 Complexity and Convergence 86
6 Application Use Case 87
7 Experiments 88
7.1 Benchmarks for Stream Lattice Skyline Algorithm 89
7.1.1 Experiments on Artifical Data 89
7.1.2 Experiments on Real World Data 91
7.2 Benchmarks for Borda Social Choice Clustering 92
7.2.1 Runtime 92
7.2.2 Iterations 93
8 Conclusion 95
References 95
Error-Bounded Approximation of Data Stream:Methods and Theories 99
1 Introduction 100
2 Preliminary 102
3 OptimalPLR: An Optimal Algorithm to Generate Error-Bounded PLR 104
3.1 Extreme Slopes of Maximal ?-Representative 105
3.1.1 Slope Rotation and Extreme Slopes 105
3.1.2 Slope Evolution and Reduction 108
3.2 Optimization Strategies 110
3.2.1 Computing Extreme Slopes 111
3.2.2 Updating Convex Hulls 112
3.3 Error-Bounded PLR Algorithm 112
3.3.1 Description of OptimalPLR 112
3.3.2 Complexity Analysis 114
3.3.3 Discussions of OptimalPLR 115
4 ParaOptimal: An Optimal Algorithm in Transformed Space 117
4.1 Description of ParaOptimal 117
4.1.1 Theoretical Preparation 117
4.1.2 Initialization 119
4.1.3 Feasible Region Update 119
4.2 Generalization of ParaOptimal 121
5 Theoretical Analysis of the Equivalence 122
5.1 Mapping of Two Spaces 122
5.2 Equivalence Discussion 123
6 Summary 125
References 126
Ensemble Dynamics in Non-stationary Data Stream Classification 129
1 Introduction 130
2 Ensemble Dynamics 132
2.1 Addition 133
2.1.1 Fixed Time of Addition 133
2.1.2 Dynamic Time of Addition 133
2.2 Removal 134
2.3 Update 135
2.4 Ensemble Dynamics Taxonomy 136
3 Formalisation 136
4 Experimental Study 141
4.1 Data Sets 142
4.1.1 Hyperplane Generator 142
4.1.2 SEA Data Stream Generator 143
4.1.3 Forest Cover-Type Data Set 143
4.1.4 Electricity Data Set 143
4.2 Results and Analysis 144
5 Discussion 147
6 Summary 157
References 158
Processing Evolving Social Networks for Change Detection Based on Centrality Measures 160
1 Introduction 160
2 User Preference Dynamics 161
2.1 User Preferences 162
2.2 Preference Changes in Evolving Environments 162
3 Preference Change Detection 163
3.1 Processing Streaming Network 163
3.2 Computing Centralities 164
3.2.1 Degree Centrality 164
3.2.2 Betweenness Centrality 164
3.2.3 Closeness Centrality 165
3.3 Moving Window Average (MWA) 165
3.4 Weighted Moving Window Average (WMWA) 166
3.5 Page–Hinckley Test (PH) 166
3.6 Change Point Scoring Function 167
3.7 Change Point Detection 167
3.8 Assumptions 168
3.9 Evaluation 168
4 Algorithms 168
5 Methodology 171
5.1 Dataset and Evolving Networks 171
5.1.1 Homogeneous Network 171
5.1.2 Bipartite Network 171
5.2 User Preference Change Events 173
6 Experiments 174
6.1 Experimental Environment 174
6.2 Detecting u1 Change-Points 175
6.3 Performance of Proposed Methods 176
6.4 Impact of Parameters 176
7 Related Work 177
8 Conclusion 179
References 180
Large-Scale Learning from Data Streams with Apache SAMOA 182
1 Introduction 182
2 Description 184
3 High Level Architecture 185
4 System Design 186
5 Machine Learning Algorithms 187
6 Vertical Hoeffding Tree 188
6.1 Vertical Parallelism 189
6.2 Algorithm Structure 190
6.3 Evaluation 193
6.3.1 Accuracy and Time of VHT Local vs. MOA 195
6.3.2 Accuracy of VHT Local vs. Distributed 195
6.4 Summary 201
7 Distributed AMRules 201
7.1 Vertical Parallelism 203
7.2 Horizontal Parallelism 204
7.3 Evaluation 205
8 Conclusions 211
References 211
Process Mining for Analyzing Customer Relationship Management Systems: A Case Study 213
1 Introduction 213
2 Related Work 215
3 INE Case Study 216
3.1 What Is INE? 216
3.2 Data and Pre-processing 216
3.3 Questions 216
3.4 Process Discovery 217
3.5 Conformance Checking 220
3.6 Performance Analysis 221
3.7 Building Social Network 222
3.8 Conclusions and Future Study 224
References 224
Detecting Smooth Cluster Changes in Evolving Graph Structures 226
1 Introduction 227
2 Clustering a Graph Sequence 228
2.1 Problem Definition 228
2.2 Preserving Cluster Membership 230
2.3 Drawbacks of PCM 232
3 Detecting Smooth Cluster Changes in a Graph Sequence 233
3.1 Clustering a Graph Sequence Using Smoothness Between Two Successive Graphs 233
3.2 Clustering Using the Forgetting Rate 236
3.3 Connectivities of Graphs 237
4 Experimental Evaluation 239
4.1 Experimental Setup 239
4.2 Results 240
4.2.1 Dependence on the Initial Graph of the Graph Sequence 240
4.2.2 Varying Cluster Numbers 242
4.2.3 Varying Numbers of Vertices 244
4.2.4 Graph Connectivities 245
4.2.5 Real-World Data 247
5 Conclusion 248
References 248
Efficient Estimation of Dynamic Density Functions with Applications in Data Streams 250
1 Introduction 251
2 Related Work 253
2.1 Dynamic Density 253
2.2 Change Detection 254
3 KDE-Track: Dynamic Density Estimation 255
3.1 Theoretical Bases of Density Estimation 255
3.2 KDE-Track Method 258
3.3 KDE-Track Implementation 261
4 Density Estimation Performance Evaluation 266
4.1 Estimation Accuracy on Synthetic Data 266
4.1.1 Datasets 266
4.2 Computational Time Cost and Space Usage 269
5 Applications 271
5.1 Visualizing the Taxi Traffic Data 271
5.2 Online Change Detection 272
6 Summary and Future Work 279
References 280
Incremental SVM Learning: Review 282
1 Introduction 282
2 SVM for Classification 283
3 Incremental SVM Learning 285
3.1 Online Incremental SVM Learning Methods 286
3.2 Semi Online Incremental SVM Learning Methods 289
4 Discussion and Comparison 293
5 Applications of Incremental SVM Learning 294
6 Conclusion 296
References 296
On Social Network-Based Algorithms for Data Stream Clustering 300
1 Introduction 300
2 Data Stream Clustering 301
3 Related Work 302
3.1 CluStream 303
3.2 ClusTree 303
3.3 DenStream 303
3.4 HAStream 304
4 Social Network-Based Approaches 304
4.1 Background on Social Networks Theory 305
4.2 CNDenStream 306
4.3 SNCStream 309
4.4 SNCStream+ 310
5 Evaluation 312
5.1 Evaluation Procedure 312
5.2 Parametrization 313
5.3 Synthetic Data 314
5.4 Real-World Datasets 314
5.5 Results 315
6 Conclusion 317
References 319

Erscheint lt. Verlag 28.7.2018
Reihe/Serie Studies in Big Data
Studies in Big Data
Zusatzinfo VIII, 317 p. 131 illus., 95 illus. in color.
Verlagsort Cham
Sprache englisch
Themenwelt Mathematik / Informatik Informatik Datenbanken
Technik Elektrotechnik / Energietechnik
Wirtschaft Betriebswirtschaft / Management
Schlagworte Artificial Intelligence • Concept drift and concept evolution in data streams • Data streams in non-stationary environments • machine learning • Neural Networks and Learning Systems • Quality Control, Reliability, Safety and Risk
ISBN-10 3-319-89803-5 / 3319898035
ISBN-13 978-3-319-89803-2 / 9783319898032
Haben Sie eine Frage zum Produkt?
PDFPDF (Wasserzeichen)
Größe: 9,7 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasser­zeichen und ist damit für Sie persona­lisiert. Bei einer missbräuch­lichen Weiter­gabe des eBooks an Dritte ist eine Rück­ver­folgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seiten­layout eignet sich die PDF besonders für Fach­bücher mit Spalten, Tabellen und Abbild­ungen. Eine PDF kann auf fast allen Geräten ange­zeigt werden, ist aber für kleine Displays (Smart­phone, eReader) nur einge­schränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich
der Grundkurs für Ausbildung und Praxis

von Ralf Adams

eBook Download (2023)
Carl Hanser Verlag GmbH & Co. KG
29,99
Das umfassende Handbuch

von Wolfram Langer

eBook Download (2023)
Rheinwerk Computing (Verlag)
49,90