Blick ins Buch

Partitional Clustering Algorithms (eBook)

eBook Download: PDF

2014 | 2015
X, 415 Seiten
Springer International Publishing (Verlag)
978-3-319-09259-1 (ISBN)

Lese- und Medienproben

Ebook-Leseprobe (PDF)

This book focuses on partitional clustering algorithms, which are commonly used in engineering and computer scientific applications. The goal of this volume is to summarize the state-of-the-art in partitional clustering. The book includes such topics as center-based clustering, competitive learning clustering and density-based clustering. Each chapter is contributed by a leading expert in the field.

Dr. Emre Celebi is an Associate Professor with the Department of Computer Science, at Louisiana State University in Shreveport.

Dr. Emre Celebi is an Associate Professor with the Department of Computer Science, at Louisiana State University in Shreveport.

Preface 6
Contents 10
1 Recent Developments in Model-Based Clusteringwith Applications 12
1.1 Introduction 12
1.2 Methodological Developments 18
1.2.1 Initialization 18
1.2.2 Spurious Solutions 21
1.2.3 Merging Mixture Components for Clustering 23
1.2.4 Nonparametric Clustering 26
1.2.5 Variable Selection for Clustering 28
1.2.6 Semi-Supervised Clustering 30
1.2.7 Diagnostics and Assessment of Partition Variability 32
1.2.8 Miscellaneous Topics 34
1.3 Modern Applications of Model-Based Clustering 35
1.3.1 Clustering Tree Ring Sequences 35
1.3.2 Identification of Differentially Expressed Genes 36
1.3.3 Analysis of Customer Navigation Patterns 38
1.3.4 Data Clustering on Unit-Hypersphere 39
1.3.5 Analysis of Mass Spectrometry Data 40
1.3.6 Network Clustering 41
References 43
2 Accelerating Lloyd's Algorithm for k-Means Clustering 51
2.1 Introduction 52
2.1.1 Popularity of the k-Means Algorithm 52
2.1.2 The Standard k-Means Algorithm does a Lot of Unnecessary Work 53
2.1.3 Previous Work on k-Means Acceleration 53
2.1.3.1 Algorithmic Improvements 54
2.1.3.2 Parallelization 55
2.1.3.3 Alternative Heuristic Methods 55
2.2 Cluster Distortion and Lloyd's Algorithm 55
2.2.1 Analysis of Lloyd's Algorithm 57
2.2.2 MacQueen's Algorithm 58
2.3 Tree Structured Approaches 58
2.3.1 Blacklisting and Filtering Centers with k-d Trees 58
2.3.2 Anchors Hierarchy 60
2.4 Triangle Inequality Approaches 60
2.4.1 Using the Triangle Inequality for Center-Center and Center-Point Distances 61
2.4.2 Maintaining Distance Bounds with the Triangle Inequality 62
2.4.3 Elkan's Algorithm: k Lower Bounds, k2 Center-Center Distances 64
2.4.4 Hamerly's Algorithm: 1 Lower Bound 64
2.4.5 Drake's Algorithm: 1 < b <
2.4.6 Annular Algorithm: Sorting the Centers by Norm 68
2.4.7 Kernelized k-Means with Distance Bounds 70
2.5 Heap-Ordered k-Means: Inverting the Innermost Loops 72
2.5.1 Reducing the Number of Bounds Kept 72
2.5.2 Cost of Combining Distance Bounds 73
2.5.3 Inverting the Loops Over n and k 73
2.5.4 Heap-Structured Bounds 73
2.5.5 Analysis of Heap-Structured k-Means 75
2.6 Parallelization 76
2.7 Experiments and Discussion 77
2.7.1 Testing Platforms 78
2.7.2 Speedup Relative to the Naive Algorithm 78
2.7.3 Parallelism 79
2.7.4 Number of Distance Calculations 81
2.7.5 Memory Use 85
2.8 Conclusion 86
References 87
3 Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm 89
3.1 Introduction 90
3.2 Linear, Deterministic, and Order-Invariant K-Means Initialization Methods 92
3.3 Experimental Setup 97
3.3.1 Data Set Descriptions 97
3.3.2 Attribute Normalization 97
3.3.3 Performance Criteria 98
3.4 Experimental Results and Discussion 99
3.5 Conclusions 103
References 104
4 Nonsmooth Optimization Based Algorithms in Cluster Analysis 109
4.1 Introduction 109
4.2 Optimization Formulations of Clustering Problems 112
4.2.1 Combinatorial Formulation of the Clustering Problem 113
4.2.2 Mixed Integer Nonlinear Programming Formulation of the Clustering Problem 113
4.2.3 Nonsmooth Nonconvex Optimization Formulation of the Clustering Problem 114
4.2.4 Comparison of Different Formulations of Clustering Problem 115
4.3 The Auxiliary Cluster Problem 115
4.4 An Incremental Clustering Algorithm 116
4.5 Computation of Starting Points for Cluster Centers 117
4.6 The Modified Incremental Clustering Algorithm 121
4.7 Solving Optimization Problems 121
4.7.1 k-Means Type Heuristic Algorithm 122
4.7.2 The Discrete Gradient Method 122
4.7.3 An Algorithm Based on Smoothing Techniques 123
4.7.3.1 Hyperbolic Smoothing of the Cluster Function 124
4.7.3.2 Hyperbolic Smoothing of the Auxiliary Cluster Function 125
4.7.3.3 Hyperbolic Smoothing of L1-Norm 125
4.7.3.4 Hyperbolic Smoothing of L?-Norm 127
4.7.3.5 Smooth Clustering Problems 128
4.8 Implementation of Incremental Clustering Algorithm 129
4.9 Computational Results: Evaluation of the Incremental Algorithm 130
4.9.1 Results for the Similarity Measure Based on the Squared L2-Norm 131
4.9.2 Results for the Similarity Measure Based on the L1-Norm 135
4.9.3 Results for the Similarity Measure Based on the L?-Norm 135
4.9.4 Dependence of Number of Distance Function Evaluations and CPU Time on Number of Clusters 135
4.9.5 Results for Purity in Data Sets with Class Labels 139
4.9.6 Visualization of Results 142
4.10 Computational Results: Comparison with Other Clustering Algorithms 143
4.10.1 Comparison of Algorithms Using the Similarity Measure Based on the Squared L2-Norm 145
4.10.2 Comparison of Algorithms Using the Similarity Measure Based on the L1-Norm 149
4.10.3 Comparison of Algorithms Using the Similarity Measure Based on the L?-Norm 150
4.11 Conclusions 153
References 154
5 Fuzzy Clustering Algorithms and Validity Indicesfor Distributed Data 157
5.1 Introduction 157
5.2 Fuzzy Clustering Algorithms 161
5.2.1 FCM: Fuzzy c-Means 162
5.2.2 GK: Gustafson Kessel 163
5.2.3 Other Fuzzy Clustering Algorithms 164
5.2.4 Complexity Analysis: Summary 164
5.3 Clustering Validation 165
5.3.1 XB: Xie-Beni 165
5.3.2 FSS: Fuzzy Simplified Silhouette 166
5.3.3 K: Kwon 166
5.3.4 TSS: Tang-Sun-Sun 167
5.3.5 FS: Fukuyama-Sugeno 167
5.3.6 FHV: Fuzzy Hypervolume 167
5.3.7 APD: Average Partition Density 168
5.3.8 PD: Partition Density 168
5.3.9 SCG 168
5.3.10 PBMF 169
5.3.11 Complexity Analysis: Summary 169
5.3.12 OMR: Ordered Multiple Runs 169
5.4 Distributed Fuzzy Clustering Algorithms 171
5.4.1 DFCM: Distributed Fuzzy c-Means 171
5.4.2 Framework for Distributed Data 174
5.4.3 Other Distributed Fuzzy Clustering Algorithms 176
5.4.4 Complexity Analysis: Communication 177
5.5 Distributed Clustering Validation 178
5.5.1 DXB: Distributed Xie-Beni 179
5.5.2 DFSS: Distributed Fuzzy Simplified Silhouette 179
5.5.3 DK: Distributed Kwon 180
5.5.4 DTSS: Distributed Tang-Sun-Sun 180
5.5.5 DFS: Distributed Fukuyama-Sugeno 181
5.5.6 DFHV: Distributed Fuzzy Hypervolume 182
5.5.7 DAPD: Distributed Average Partition Density 182
5.5.8 DPD: Distributed Partition Density 183
5.5.9 DSCG: Distributed SCG 183
5.5.10 DPBMF: Distributed PBMF 183
5.5.11 Complexity Analysis: Communication 184
5.5.12 DOMR: Distributed Ordered Multiple Runs 185
5.6 Summary of Results 185
5.7 Experimental Evaluation 186
5.8 Final Remarks 190
Appendix: Additional Fuzzy Clustering Algorithms 191
GG: Gath and Geva 191
FCV: Fuzzy c-Varieties 192
FCE: Fuzzy c-Elliptotypes 193
PCM: Possibilistic c-Means 194
PGK: Possibilistic Gustafson-Kessel 196
FPCM: Fuzzy-Possibilistic c-Means 196
PFCM: Possibilistic-Fuzzy c-Means 198
References 199
6 Density Based Clustering: Alternatives to DBSCAN 203
6.1 Introduction 203
6.2 Related Work 206
6.2.1 DBSCAN 206
6.3 Clustering with a Different Notion of Density 210
6.3.1 Black Hole Clustering 210
6.3.2 Protoclustering 212
6.4 Evaluation 215
6.4.1 Black Hole Clustering Evaluation 217
6.4.2 Protoclustering 219
6.5 Conclusion 221
References 222
7 Nonnegative Matrix Factorization for Interactive Topic Modeling and Document Clustering 224
7.1 Introduction to Nonnegative Matrix Factorization 225
7.2 Nonnegative Matrix Factorization for Clustering 226
7.3 Optimization Framework for Nonnegative Matrix Factorization 229
7.3.1 Block Coordinate Descent Framework 229
7.3.1.1 Convergence Property 231
7.3.1.2 Stopping Criterion 232
7.3.2 Extension 1: Sparse NMF 233
7.3.3 Extension 2: Weakly-Supervised NMF 234
7.4 Choosing the Number of Clusters 235
7.5 Experimental Results 237
7.5.1 Data Sets and Algorithms 237
7.5.2 Clustering Quality 239
7.5.3 Convergence Behavior 241
7.5.4 Sparseness 241
7.5.5 Consistency from Multiple Runs 242
7.6 UTOPIAN: User-driven Topic Modeling via Interactive NMF 243
7.6.1 Usage Scenarios 245
7.7 Conclusions and Future Directions 247
References 249
8 Overview of Overlapping Partitional Clustering Methods 253
8.1 Introduction 253
8.2 Classification of Exiting Approaches for Overlapping Clustering 254
8.3 Overlapping Partitional Clustering Methods 257
8.3.1 Uncertain Memberships Based-Methods 257
8.3.1.1 Fuzzy c-Means (FCM) and Possibilistic c-Means (PCM) 258
8.3.1.2 Evidential c-Means (ECM) and Belief c-Means (BCM) 260
8.3.2 Hard Memberships Based-Methods 263
8.3.2.1 Additive Methods 263
8.3.2.2 Geometrical Methods 265
8.3.3 Summary of Overlapping Partitional Methods 270
8.4 Evaluation of Overlapping Clustering 272
8.4.1 Label Based Evaluation 272
8.4.2 Pair Based Evaluation 273
8.4.3 BCubed Evaluation 274
8.4.4 Synthesis of Evaluation Methods for Overlapping Clustering 275
8.5 Empirical Evaluation of Overlapping Partitional Clustering Methods 276
8.6 Conclusion 281
References 281
9 On Semi-Supervised Clustering 284
9.1 Introduction 284
9.2 Overview and Taxonomy of Algorithms 287
9.2.1 Types of Supervision (Side-Knowledge) 288
9.2.2 Partitional SSC Algorithms 289
9.2.3 Hierarchical SSC Algorithms 290
9.3 Main Issues and Key Points 291
9.3.1 Evaluation of Clustering Quality: Internal/External Measures 291
9.3.2 Relevant Questions on Supervision 292
9.3.2.1 Equivalence Between Constraint-Based and Label-Based Supervision 292
9.3.2.2 Selecting the Best Query in Active Learning 293
9.3.2.3 Uncertain Supervision 294
9.4 Algorithms for SSC 295
9.4.1 COP-COBWEB and COP-kMeans 295
9.4.2 A Probabilistic Framework for SSC: The HMRF k-Means Method 297
9.4.3 Semi-Supervised Learning of the Distance Measure 301
9.4.4 Seeded k-Means and Constrained k-Means 303
9.4.5 The Zheng-Li Algorithm 304
9.4.6 Active Fuzzy Constrained Clustering 307
9.4.7 Semi-Supervised Graph Clustering: A Kernel Approach 310
9.5 A Glimpse of the Future: Some Research Directions 313
9.6 Conclusions 315
References 315
10 Consensus of Clusterings Based on High-Order Dissimilarities 319
10.1 Introduction 319
10.2 High-Order Dissimilarity: The Dissimilarity Increments Principle 322
10.2.1 Dissimilarity Increments: Definition and Properties 323
10.2.2 Dissimilarity Increments Distribution (DID) 323
10.2.2.1 DID for High-Dimensional Data 324
10.2.2.2 DID for Two-Dimensional Data 325
10.2.2.3 Characterization and Properties of 2-DID 326
10.2.3 DID Models and Data Fitting 328
10.2.3.1 Best Approximation to DID for High-Dimensional Data 328
10.2.3.2 Fitting DID to Non-Gaussian Data 330
10.3 Partitional Clustering 330
10.3.1 Initial Data Partition 331
10.3.2 Merge Criterion 332
10.4 Consensus Clustering with DID 333
10.5 Validation with DID 334
10.6 Experimental Results and Discussion 336
10.6.1 Partitional Clustering 338
10.6.1.1 Known Number of Clusters 338
10.6.1.2 Unknown Number of Clusters 339
10.6.2 Consensus Clustering 345
10.7 Conclusions 350
Appendix 352
References 355
11 Hubness-Based Clustering of High-Dimensional Data 358
11.1 Introduction 359
11.2 High-Dimensional Data Clustering 359
11.3 The Hubness Phenomenon 360
11.3.1 The Emergence of Hubs 361
11.3.2 Relation of Hubs to Data Clusters 363
11.4 Effects of Hubness on Clustering 364
11.5 Interaction Between Kernels and Hubness 368
11.6 Clustering Algorithms Based on Hubness 371
11.6.1 Deterministic Approach 372
11.6.2 Probabilistic Approach 373
11.6.3 A Hybrid Approach: Extending K-Means 375
11.6.4 Using Kernels in Hubness-Based Clustering 375
11.6.5 Scalability of Hubness-Based Approaches 377
11.7 Experimental Comparisons 379
11.8 Application Domains and Other Types of Hubness-Aware Methods 385
11.9 Possible Future Directions 386
References 387
12 Clustering for Monitoring Distributed Data Streams 392
12.1 Introduction 393
12.2 Text Mining Application 395
12.3 Monitoring Threshold Functions Through Clustering: Motivation 396
12.4 Monitoring Threshold Functions Through Clustering: Implementation 399
12.5 Experimental Results 403
12.5.1 Data 403
12.5.2 Monitoring with Incremental Clustering 404
12.6 Conventional Clustering Algorithms 406
12.6.1 PDDP 407
12.6.2 Batch k-Means 408
12.6.3 Incremental k-Means 410
12.6.4 Batch k-Means Followed by Incremental k-Means 410
12.6.5 Node Clustering with Classical Clustering Algorithms 411
12.7 Conclusions 412
Appendix 1: First and Second Moments 413
Appendix 2: Broadcast Count 416
References 418

Erscheint lt. Verlag	7.11.2014
Zusatzinfo	X, 415 p. 78 illus., 45 illus. in color.
Verlagsort	Cham
Sprache	englisch
Themenwelt	Mathematik / Informatik ► Informatik ► Web / Internet
Themenwelt	Technik ► Elektrotechnik / Energietechnik
Schlagworte	Center Based Clustering • Flat Clustering • Fuzzy c-means • K-means • Nonhierarchical Clustering • Objective Function Based Clustering • Partitional Clustering • unsupervised classification • Unsupervised Learning
ISBN-10	3-319-09259-6 / 3319092596
ISBN-13	978-3-319-09259-1 / 9783319092591

Haben Sie eine Frage zum Produkt?

PDF (Wasserzeichen)
Größe: 8,4 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Print-Ausgabe

Buch | Hardcover

106,99 €