Association Rule Hiding for Data Mining (eBook)
XX, 138 Seiten
Springer US (Verlag)
978-1-4419-6569-1 (ISBN)
Privacy and security risks arising from the application of different data mining techniques to large institutional data repositories have been solely investigated by a new research domain, the so-called privacy preserving data mining. Association rule hiding is a new technique in data mining, which studies the problem of hiding sensitive association rules from within the data.
Association Rule Hiding for Data Mining addresses the problem of 'hiding' sensitive association rules, and introduces a number of heuristic solutions. Exact solutions of increased time complexity that have been proposed recently are presented, as well as a number of computationally efficient (parallel) approaches that alleviate time complexity problems, along with a thorough discussion regarding closely related problems (inverse frequent item set mining, data reconstruction approaches, etc.). Unsolved problems, future directions and specific examples are provided throughout this book to help the reader study, assimilate and appreciate the important aspects of this challenging problem.
Association Rule Hiding for Data Mining is designed for researchers, professors and advanced-level students in computer science studying privacy preserving data mining, association rule mining, and data mining. This book is also suitable for practitioners working in this industry.
Privacy and security risks arising from the application of different data miningtechniques to large institutional data repositories have been solely investigated by anew research domain, the so-called privacy preserving data mining. Association rulehiding is a new technique on data mining, which studies the problem of hiding sensitiveassociation rules from within the data. Association Rule Hiding for Data Mining addresses the optimization problem of"e;hiding"e; sensitive association rules which due to its combinatorial nature admitsa number of heuristic solutions that will be proposed and presented in this book. Exact solutions of increased time complexity that have been proposed recently arealso presented as well as a number of computationally efficient (parallel) approachesthat alleviate time complexity problems, along with a discussion regarding unsolvedproblems and future directions. Specific examples are provided throughout this bookto help the reader study, assimilate and appreciate the important aspects of this challengingproblem. Association Rule Hiding for Data Mining is designed for researchers, professorsand advanced-level students in computer science studying privacy preserving datamining, association rule mining, and data mining. This book is also suitable forpractitioners working in this industry.
Preface 8
Book Organization 9
Intended Audience 10
How to Study This Book 10
Acknowledgments 11
Contents 12
List of Tables 16
List of Figures 18
List of Algorithms 20
Part I Fundamental Concepts 22
Chapter 1 Introduction 23
1.1 Privacy Preserving Data Mining 25
1.2 Association Rule Hiding 25
1.3 Research Challenges 27
1.4 Summary 28
Chapter 2 Background 29
2.1 Terminology and Preliminaries 29
2.2 Problem Formulation and Statement 32
2.2.1 Goals of Association Rule Hiding Methodologies 33
2.2.2 Problem Statement 34
Chapter 3 Classes of Association Rule Hiding Methodologies 36
3.1 Classification Dimensions 36
3.2 Classes of Association Rule Hiding Algorithms 38
Chapter 4 Other Knowledge Hiding Methodologies 40
4.1 Classification Rule Hiding 40
4.2 Privacy Preserving Clustering 41
4.3 Sequence Hiding 42
Chapter 5 Summary 44
Part II Heuristic Approaches 45
Chapter 6 Distortion Schemes 46
Chapter 7 Blocking Schemes 51
Chapter 8 Summary 53
Part III Border Based Approaches 54
Chapter 9 Border Revision for Knowledge Hiding 55
Chapter 10 BBA Algorithm 61
10.1 Hiding Goals 62
10.2 Hiding Process 63
10.2.1 Weighing Border Itemsets 64
10.2.2 Hiding a Sensitive Itemset 65
10.2.3 Order of Hiding Itemsets 65
10.3 Algorithm 66
Chapter 11 Max–Min Algorithms 67
11.1 Hiding a Sensitive Itemset 68
11.2 Order of Hiding Itemsets 69
11.3 Max–Min 1 Algorithm 69
11.4 Max–Min 2 Algorithm 70
Chapter 12 Summary 73
Part IV Exact Hiding Approaches 74
Chapter 13 Menon’s Algorithm 75
13.1 Exact Part 76
13.1.1 CSP Formulation 76
13.1.2 CSP Decomposition 77
13.2 Heuristic Part 81
Chapter 14 Inline Algorithm 83
14.1 Privacy Model 84
14.2 Solution Methodology 86
14.2.1 Problem Size Minimization 86
14.2.2 Reduction to CSP and the BIP solution 87
14.2.3 An Example 89
14.3 Experiments and Results 91
14.3.1 Datasets 92
14.3.2 Evaluation Methodology 92
14.3.3 Experimental Results 93
14.3.4 Discussion on the Efficiency of the Inline Algorithm 93
Chapter 15 Two–Phase Iterative Algorithm 95
15.1 Background 95
15.2 Removal of Constraints from Infeasible CSPs 97
15.3 An Example 98
15.4 Experimental Evaluation 101
Chapter 16 Hybrid Algorithm 105
16.1 Knowledge Hiding Formulation 106
16.1.1 Hybrid Hiding Methodology 106
16.1.2 A Running Example 108
16.2 Main Issues Pertaining to the Hiding Methodology 108
16.2.1 Size of Database Extension 108
16.2.2 Optimal Solutions in the Hybrid Methodology 110
16.2.3 Revision of the Borders 111
16.2.4 Problem Size Reduction 112
16.2.5 Handling Suboptimality 113
16.3 Hybrid Solution Methodology 114
16.3.1 Problem Size Minimization 114
16.3.2 Adjusting the Size of the Extension 116
16.3.3 Formulation and Solution of the CSP 117
16.3.4 Minimum Extension and Validity of Transactions 117
16.3.5 Treatment of Suboptimality in Hiding Solutions 119
16.3.6 Continuing the Running Example 120
16.4 A Partitioning Approach to Improve Scalability 122
16.4.1 Partitioning Methodology 122
16.4.2 An Example 123
16.5 Experimental Evaluation 124
Chapter 17 Parallelization Framework for Exact Hiding 131
17.1 The Parallelization Framework 131
17.1.1 Structural Decomposition of the CSP 132
17.1.2 Decomposition of Large Independent Components 135
17.1.2.1 Decomposition Using Articulation Points 135
17.1.2.2 Decomposition UsingWeighted Graph Partitioning 137
17.1.3 Parallel Solving of the Produced CSPs 138
17.2 Computational Experiments and Results 140
Chapter 18 Quantifying the Privacy of Exact Hiding Algorithms 143
Chapter 19 Summary 147
Part V Epilogue 148
Chapter 20 Conclusions 149
Chapter 21 Roadmap to FutureWork 152
References 154
Index 158
"Chapter 10 BBA Algorithm (S. 47-48)
Sun & Yu [66, 67] in 2005 proposed the first frequent itemset hiding methodology that relies on the notion of the border [46] of the nonsensitive frequent itemsets to track the impact of altering transactions in the original database. By evaluating the impact of each candidate item modification to the itemsets of the revised positive border, the algorithm greedily selects to apply those modifications (item deletions) that cause the least impact to the border itemsets. As already covered in the previous chapter, the border itemsets implicitly dictate the status (i.e., frequent vs. infrequent) of every itemset in the database. Consequently, the quality of the borders directly affects the quality of the sanitized database that is produced by the hiding algorithm.
The heuristic strategy that was proposed in [66, 67], assigns a weight to each itemset of the revised positive border (which is the original positive border after it has been shaped up with the removal of the sensitive itemsets) in an attempt to quantify its vulnerability of being affected by an item deletion. The assigned weights are dynamically computed during the sanitization process as a function of the current support of the corresponding itemsets in the database.
To hide a sensitive itemset, the algorithm calculates the expected impact of each candidate item deletion to the itemsets of the revised positive border, by computing the sum of the weights of the revised positive border itemsets that will be affected. Then, the algorithm determines the optimal deletion candidate item, which is the item whose deletion has minimal impact on the revised positive border, and deletes this item from a set of carefully selected transactions.
The proposed strategy aims to minimize the number of nonsensitive frequent itemsets that are affected from the hiding of the sensitive knowledge, as well as it attempts to maintain the relative support of the nonsensitive frequent itemsets in the sanitized database. The rest of this chapter is organized as follows. In Section 10.1, we explicitly state the objectives that drive the hiding process of the BBA algorithm. Following that, Section 10.2 provides the strategy that is employed for the hiding of a sensitive itemset in a way that bears minimal impact on the revised positive border, as well as it presents the order in which the sensitive itemsets are selected to be hidden. Finally, Section 10.3 delivers the pseudocode of the algorithm.
10.1 Hiding Goals
In Chapter 2 (Section 2.2.1) we presented the main goals of association rule hiding methodologies. In the context of frequent itemset hiding, these goals can be restated as follows: (i) all the sensitive itemsets that are mined from the original database should be hidden so that they cannot be mined from its sanitized counterpart under the same (or a higher) threshold of support, (ii) all the nonsensitive frequent itemsets of the original database should be preserved in the sanitized database so that they can be mined under the same threshold of support, and (iii) no ghost itemsets are introduced to the sanitized database, i.e. no itemset that was not among the nonsensitive frequent ones mined from the original database, can be mined from the sanitized database under the same or a higher threshold of support."
Erscheint lt. Verlag | 17.5.2010 |
---|---|
Reihe/Serie | Advances in Database Systems | Advances in Database Systems |
Zusatzinfo | XX, 138 p. 60 illus. |
Verlagsort | New York |
Sprache | englisch |
Themenwelt | Informatik ► Datenbanken ► Data Warehouse / Data Mining |
Mathematik / Informatik ► Informatik ► Programmiersprachen / -werkzeuge | |
Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik | |
Wirtschaft ► Betriebswirtschaft / Management ► Wirtschaftsinformatik | |
Schlagworte | Algorithm analysis and problem complexity • association • Complexity • Computer • Computer Science • currentjm • Data Mining • Gkoulalas • Knowledge • Optimization • privacy • Rule Hiding |
ISBN-10 | 1-4419-6569-6 / 1441965696 |
ISBN-13 | 978-1-4419-6569-1 / 9781441965691 |
Haben Sie eine Frage zum Produkt? |
Größe: 2,9 MB
DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.
Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich