Managing and Mining Graph Data (eBook)
XXII, 600 Seiten
Springer US (Verlag)
978-1-4419-6045-0 (ISBN)
Managing and Mining Graph Data is a comprehensive survey book in graph management and mining. It contains extensive surveys on a variety of important graph topics such as graph languages, indexing, clustering, data generation, pattern mining, classification, keyword search, pattern matching, and privacy. It also studies a number of domain-specific scenarios such as stream mining, web graphs, social networks, chemical and biological data. The chapters are written by well known researchers in the field, and provide a broad perspective of the area. This is the first comprehensive survey book in the emerging topic of graph data processing.
Managing and Mining Graph Data is designed for a varied audience composed of professors, researchers and practitioners in industry. This volume is also suitable as a reference book for advanced-level database students in computer science and engineering.
Managing and Mining Graph Data is a comprehensive survey book in graph management and mining. It contains extensive surveys on a variety of important graph topics such as graph languages, indexing, clustering, data generation, pattern mining, classification, keyword search, pattern matching, and privacy. It also studies a number of domain-specific scenarios such as stream mining, web graphs, social networks, chemical and biological data. The chapters are written by well known researchers in the field, and provide a broad perspective of the area. This is the first comprehensive survey book in the emerging topic of graph data processing. Managing and Mining Graph Data is designed for a varied audience composed of professors, researchers and practitioners in industry. This volume is also suitable as a reference book for advanced-level database students in computer science and engineering.
Contents 6
List of Figures 15
List of Tables 21
Preface 22
AN INTRODUCTION TO GRAPH DATA 24
1. Introduction 24
2. Graph Management and Mining Applications 26
3. Summary 31
References 32
GRAPH DATA MANAGEMENT AND MINING: A SURVEYOF ALGORITHMS AND APPLICATIONS 35
1. Introduction 35
2. Graph Data Management Algorithms 38
2.1 Indexing and Query Processing Techniques 38
2.2 Reachability Queries 41
2.3 Graph Matching 43
2.4 Keyword Search 46
2.5 Synopsis Construction of Massive Graphs 49
3. Graph Mining Algorithms 51
3.1 Pattern Mining in Graphs 51
3.2 Clustering Algorithms for Graph Data 54
3.3 Classification Algorithms for Graph Data 59
3.4 The Dynamics of Time-Evolving Graphs 62
4. Graph Applications 65
4.1 Chemical and Biological Applications 65
4.2 Web Applications 67
4.3 Software Bug Localization 73
5. Conclusions and Future Research 77
Notes 77
References 77
GRAPH MINING: LAWS AND GENERATORS 91
1. Introduction 92
2. Graph Patterns 93
2.1 Power Laws and Heavy-Tailed Distributions 94
2.2 Small Diameters 99
2.3 Other Static Graph Patterns 101
2.4 Patterns in Evolving Graphs 104
2.5 The Structure of Specific Graphs 106
3. Graph Generators 108
3.1 Random Graph Models 110
3.2 Preferential Attachment and Variants 114
3.3 Optimization-based generators 123
3.4 Tensor-based 130
3.5 Generators for specific graphs 135
3.6 Graph Generators: A summary 137
4. Conclusions 137
Notes 138
References 139
QUERY LANGUAGE AND ACCESS METHODS FOR GRAPH DATABASES 146
1. Introduction 147
1.1 Graphs-at-a-time Queries 147
1.2 Graph Specific Optimizations 148
1.3 GraphQL 149
2. Operations on Graph Structures 150
2.1 Concatenation 151
2.2 Disjunction 152
2.3 Repetition 152
3. Graph Query Language 153
3.1 Data Model 153
3.2 Graph Patterns 154
3.3 Graph Algebra 155
3.4 FLWR Expressions 158
3.5 Expressive Power 159
4. Implementation of the Selection Operator 161
4.1 Graph Pattern Matching 161
4.2 Local Pruning and Retrieval of Feasible Mates 163
4.3 Joint Reduction of Search Space 165
4.4 Optimization of Search Order 167
5. Experimental Study 169
5.1 Biological Network 169
5.2 Synthetic Graphs 171
6. RelatedWork 173
6.1 Graph Query Languages 173
6.2 Graph Indexing 176
7. Future Research Directions 176
8. Conclusion 177
Acknowledgments 177
Appendix: Query Syntax of GraphQL 177
References 178
GRAPH INDEXING 182
1. Introduction 182
2. Feature-Based Graph Index 183
2.1 Paths 184
2.2 Frequent Structures 185
2.3 Discriminative Structures 187
2.4 Closed Frequent Structures 188
2.5 Trees 188
2.6 Hierarchical Indexing 189
3. Structure Similarity Search 190
3.1 Feature-Based Structural Filtering 191
3.2 Feature Miss Estimation 192
3.3 Frequency Difference 193
3.4 Feature Set Selection 194
3.5 Structures with Gaps 195
4. Reverse Substructure Search 196
5. Conclusions 198
References 199
GRAPH REACHABILITY QUERIES: A SURVEY 202
1. Introduction 202
2. Traversal Approaches 207
2.1 Tree+SSPI 208
2.2 GRIPP 208
3. Dual-Labeling 209
4. Tree Cover 211
5. Chain Cover 212
5.1 Computing the Optimal Chain Cover 214
6. Path-Tree Cover 215
7. 2-HOP Cover 217
7.1 A Heuristic Ranking 218
7.2 A Geometrical-Based Approach 219
7.3 Graph Partitioning Approaches 220
7.4 2-Hop Cover Maintenance 223
8. 3-Hop Cover 225
9. Distance-Aware 2-Hop Cover 226
10. Graph Pattern Matching 228
10.1 A Special Case: 229
10.2 The General Cases 232
11. Conclusions and Summary 233
References 233
EXACT AND INEXACT GRAPH MATCHING: METHODOLOGY AND APPLICATIONS 237
1. Introduction 238
2. Basic Notations 239
3. Exact Graph Matching 241
4. Inexact Graph Matching 246
4.1 Graph Edit Distance 247
4.2 Other Inexact Graph Matching Techniques 249
5. Graph Matching for Data Mining and Information Retrieval 251
6. Vector Space Embeddings of Graphs via Graph Matching 255
7. Conclusions 259
References 260
A SURVEY OF ALGORITHMS FOR KEYWORD SEARCH ON GRAPH DATA 268
1. Introduction 269
2. Keyword Search on XML Data 271
2.1 Query Semantics 272
2.2 Answer Ranking 273
2.3 Algorithms for LCA-based Keyword Search 277
3. Keyword Search on Relational Data 279
3.1 Query Semantics 279
3.2 DBXplorer and DISCOVER 280
4. Keyword Search on Schema-Free Graphs 282
4.1 Query Semantics and Answer Ranking 282
4.2 Graph Exploration by Backward Search 284
4.3 Graph Exploration by Bidirectional Search 285
4.4 Index-based Graph Exploration – the BLINKS Algorithm 286
4.5 The ObjectRank Algorithm 288
5. Conclusions and Future Research 290
References 290
A SURVEY OF CLUSTERING ALGORITHMS FOR GRAPH DATA 293
1. Introduction 293
2. Node Clustering Algorithms 295
2.1 The Minimum Cut Problem 295
2.2 Multi-way Graph Partitioning 299
2.3 Conventional Generalizations and Network Structure Indices 300
2.4 The Girvan-Newman Algorithm 302
2.5 The Spectral Clustering Method 303
2.6 Determining Quasi-Cliques 306
2.7 The Case of Massive Graphs 307
3. Clustering Graphs as Objects 309
3.1 Extending Classical Algorithms to Structural Data 309
3.2 The XProj Approach 311
4. Applications of Graph Clustering Algorithms 313
4.1 Community Detection in Web Applications and Social Networks 314
4.2 Telecommunication Networks 315
4.3 Email Analysis 315
5. Conclusions and Future Research 315
References 317
A SURVEY OF ALGORITHMS FOR DENSE SUBGRAPH DISCOVERY 320
1. Introduction 321
2. Types of Dense Components 322
2.1 Absolute vs. Relative Density 322
2.2 Graph Terminology 323
2.3 Definitions of Dense Components 324
2.4 Dense Component Selection 325
2.5 Relationship between Clusters and Dense Components 326
3. Algorithms for Detecting Dense Components in a Single Graph 328
3.1 Exact Enumeration Approach 328
3.2 Heuristic Approach 331
3.3 Exact and Approximation Algorithms for Discovering Densest Components 339
4. Frequent Dense Components 344
4.1 Frequent Patterns with Density Constraints 344
4.2 Dense Components with Frequency Constraint 345
4.3 Enumerating Cross-Graph Quasi-Cliques 345
5. Applications of Dense Component Analysis 346
6. Conclusions and Future Research 348
References 350
GRAPH CLASSIFICATION 354
1. Introduction 354
2. Graph Kernels 357
2.1 RandomWalks on Graphs 358
2.2 Label Sequence Kernel 359
2.3 Efficient Computation of Label Sequence Kernels 360
2.4 Extensions 366
3. Graph Boosting 366
3.1 Formulation of Graph Boosting 368
3.2 Optimal Pattern Search 370
3.3 Computational Experiments 371
3.4 RelatedWork 372
4. Applications of Graph Classification 375
5. Label Propagation 375
6. Concluding Remarks 376
References 376
MINING GRAPH PATTERNS 381
1. Introduction 382
2. Frequent Subgraph Mining 382
2.1 Problem Definition 382
2.2 Apriori-based Approach 383
2.3 Pattern-Growth Approach 384
2.4 Closed and Maximal Subgraphs 385
2.5 Mining Subgraphs in a Single Graph 386
2.6 The Computational Bottleneck 387
3. Mining Significant Graph Patterns 388
3.1 Problem Definition 388
3.2 gboost: A Branch-and-Bound Approach 389
3.3 gPLS: A Partial Least Squares Regression Approach 391
3.4 LEAP: A Structural Leap Search Approach 394
3.5 GraphSig: A Feature Representation Approach 398
4. Mining Representative Orthogonal Graphs 401
4.1 Problem Definition 402
4.2 Randomized Maximal Subgraph Mining 403
4.3 Orthogonal Representative Set Generation 404
5. Conclusions 405
References 405
A SURVEY ON STREAMING ALGORITHMS FOR MASSIVE GRAPHS 409
1. Introduction 409
2. Streaming Model for Massive Graphs 411
3. Statistics and Counting Triangles 413
4. Graph Matching 416
4.1 Unweighted Matching 416
4.2 Weighted Matching 419
5. Graph Distance 421
5.1 Distance Approximation using Multiple Passes 422
5.2 Distance Approximation in One Pass 427
6. Random Walks on Graphs 428
7. Conclusions 432
References 433
A SURVEY OF PRIVACY-PRESERVATION OF GRAPHS AND SOCIAL NETWORKS 437
1. Introduction 438
1.1 Privacy in Publishing Social Networks 438
1.2 Background Knowledge 439
1.3 Utility Preservation 440
1.4 Anonymization Approaches 440
1.5 Notations 441
2. Privacy Attacks on Naive Anonymized Networks 442
2.1 Active Attacks and Passive Attacks 442
2.2 Structural Queries 443
2.3 Other Attacks 444
3. K-Anonymity Privacy Preservation via Edge Modification 444
3.1 K-Degree Generalization 445
3.2 K-Neighborhood Anonymity 446
3.3 K-Automorphism Anonymity 447
4. Privacy Preservation via Randomization 449
4.1 Resilience to Structural Attacks 450
4.2 Link Disclosure Analysis 451
4.3 Reconstruction 453
4.4 Feature Preserving Randomization 454
5. Privacy Preservation via Generalization 456
6. Anonymizing Rich Graphs 457
6.1 Link Protection in Rich Graphs 458
6.2 Anonymizing Bipartite Graphs 459
6.3 Anonymizing Rich Interaction Graphs 460
6.4 Anonymizing Edge-Weighted Graphs 461
7. Other Privacy Issues in Online Social Networks 462
7.1 Deriving Link Structure of the Entire Network 462
7.2 Deriving Personal Identifying Information from Social Networking Sites 464
8. Conclusion and Future Work 464
Acknowledgments 465
References 465
A SURVEY OF GRAPH MINING FOR WEB APPLICATIONS 470
1. Introduction 471
2. Preliminaries 472
2.1 Link Analysis Ranking Algorithms 474
3. Mining High-Quality Items 476
3.1 Prediction of Successful Items in a Co-citation Network 478
3.2 Finding High-Quality Content in Question-Answering Portals 480
4. Mining Query Logs 484
4.1 Description of Query Logs 485
4.2 Query Log Graphs 485
4.3 Query Recommendations 492
5. Conclusions 495
References 496
GRAPH MINING APPLICATIONS TO SOCIAL NETWORK ANALYSIS 501
1. Introduction 501
2. Graph Patterns in Large-Scale Networks 503
2.1 Scale-Free Networks 503
2.2 Small-World Effect 505
2.3 Community Structures 506
2.4 Graph Generators 508
3. Community Detection 508
3.1 Node-Centric Community Detection 509
3.2 Group-Centric Community Detection 512
3.3 Network-Centric Community Detection 513
3.4 Hierarchy-Centric Community Detection 518
4. Community Structure Evaluation 519
5. Research Issues 521
References 522
SOFTWARE-BUG LOCALIZATION WITH GRAPH MINING 528
1. Introduction 529
2. Basics of Call Graph Based Bug Localization 530
2.1 Dynamic Call Graphs 530
2.2 Bugs in Software 531
2.3 Bug Localization with Call Graphs 532
2.4 Graph and Tree Mining 533
3. RelatedWork 534
4. Call-Graph Reduction 538
4.1 Total Reduction 538
4.2 Iterations 539
4.3 Temporal Order 541
4.4 Recursion 542
4.5 Comparison 544
5. Call Graph Based Bug Localization 545
5.1 Structural Approaches 545
5.2 Frequency-based Approach 548
5.3 Combined Approaches 551
5.4 Comparison 552
6. Conclusions and Future Directions 555
Acknowledgments 556
References 556
A SURVEY OF GRAPH MINING TECHNIQUES FOR BIOLOGICAL DATASETS 560
1. Introduction 561
2. Mining Trees 562
2.1 Frequent Subtree Mining 563
2.2 Tree Alignment and Comparison 565
2.3 Statistical Models 567
3. Mining Graphs for the Discovery of Frequent Substructures 568
3.1 Frequent Subgraph Mining 568
3.2 Motif Discovery in Biological Networks 573
4. Mining Graphs for the Discovery of Modules 575
4.1 Extracting Communities 577
4.2 Clustering 579
5. Discussion 582
References 584
TRENDS IN CHEMICAL GRAPH DATA MINING 594
1. Introduction 595
2. Topological Descriptors for Chemical Compounds 596
2.1 Hashed Fingerprints (FP) 597
2.2 Maccs Keys (MK) 597
2.3 Extended Connectivity Fingerprints (ECFP) 597
2.4 Frequent Subgraphs (FS) 598
2.5 Bounded-Size Graph Fragments (GF) 598
2.6 Comparison of Descriptors 598
3. Classification Algorithms for Chemical Compounds 601
3.1 Approaches based on Descriptors 601
3.2 Approaches based on Graph Kernels 602
4. Searching Compound Libraries 603
4.1 Methods Based on Direct Similarity 604
4.2 Methods Based on Indirect Similarity 605
4.3 Performance of Indirect Similarity Methods 607
5. Identifying Potential Targets for Compounds 608
5.1 Model-based Methods For Target Fishing 609
5.2 Performance of Target Fishing Strategies 613
6. Future Research Directions 613
References 615
Index 620
Erscheint lt. Verlag | 2.2.2010 |
---|---|
Reihe/Serie | Advances in Database Systems | Advances in Database Systems |
Zusatzinfo | XXII, 600 p. 20 illus. |
Verlagsort | New York |
Sprache | englisch |
Themenwelt | Informatik ► Datenbanken ► Data Warehouse / Data Mining |
Mathematik / Informatik ► Informatik ► Grafik / Design | |
Mathematik / Informatik ► Informatik ► Software Entwicklung | |
Schlagworte | algorithm • biological datasets • chemical graph data • classification • Clustering • clustering algorithms • currentjm • Data Mining • Graph • graph data • graph databases • graph mining • graph reachability • Information Retrieval • Managing • Mining • privacy-preservation • query language • Social Networks |
ISBN-10 | 1-4419-6045-7 / 1441960457 |
ISBN-13 | 978-1-4419-6045-0 / 9781441960450 |
Haben Sie eine Frage zum Produkt? |
Größe: 8,7 MB
DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.
Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich