Data Mining, Southeast Asia Edition -  Jiawei Han,  Micheline Kamber,  Jian Pei

Data Mining, Southeast Asia Edition (eBook)

eBook Download: PDF
2006 | 2. Auflage
800 Seiten
Elsevier Science (Verlag)
978-0-08-047558-5 (ISBN)
Systemvoraussetzungen
14,18 inkl. MwSt
  • Download sofort lieferbar
  • Zahlungsarten anzeigen
Our ability to generate and collect data has been increasing rapidly. Not only are all of our business, scientific, and government transactions now computerized, but the widespread use of digital cameras, publication tools, and bar codes also generate data. On the collection side, scanned text and image platforms, satellite remote sensing systems, and the World Wide Web have flooded us with a tremendous amount of data. This explosive growth has generated an even more urgent need for new techniques and automated tools that can help us transform this data into useful information and knowledge.

Like the first edition, voted the most popular data mining book by KD Nuggets readers, this book explores concepts and techniques for the discovery of patterns hidden in large data sets, focusing on issues relating to their feasibility, usefulness, effectiveness, and scalability. However, since the publication of the first edition, great progress has been made in the development of new data mining methods, systems, and applications. This new edition substantially enhances the first edition, and new chapters have been added to address recent developments on mining complex types of data- including stream data, sequence data, graph structured data, social network data, and multi-relational data.

Whether you are a seasoned professional or a new student of data mining, this book has much to offer you:
* A comprehensive, practical look at the concepts and techniques you need to know to get the most out of real business data.
* Updates that incorporate input from readers, changes in the field, and more material on statistics and machine learning.
* Dozens of algorithms and implementation examples, all in easily understood pseudo-code and suitable for use in real-world, large-scale data mining projects.
* Complete classroom support for instructors at www.mkp.com/datamining2e companion site.
Our ability to generate and collect data has been increasing rapidly. Not only are all of our business, scientific, and government transactions now computerized, but the widespread use of digital cameras, publication tools, and bar codes also generate data. On the collection side, scanned text and image platforms, satellite remote sensing systems, and the World Wide Web have flooded us with a tremendous amount of data. This explosive growth has generated an even more urgent need for new techniques and automated tools that can help us transform this data into useful information and knowledge. Like the first edition, voted the most popular data mining book by KD Nuggets readers, this book explores concepts and techniques for the discovery of patterns hidden in large data sets, focusing on issues relating to their feasibility, usefulness, effectiveness, and scalability. However, since the publication of the first edition, great progress has been made in the development of new data mining methods, systems, and applications. This new edition substantially enhances the first edition, and new chapters have been added to address recent developments on mining complex types of data including stream data, sequence data, graph structured data, social network data, and multi-relational data. - A comprehensive, practical look at the concepts and techniques you need to know to get the most out of real business data- Updates that incorporate input from readers, changes in the field, and more material on statistics and machine learning- Dozens of algorithms and implementation examples, all in easily understood pseudo-code and suitable for use in real-world, large-scale data mining projects- Complete classroom support for instructors at www.mkp.com/datamining2e companion site

Front cover 1
Title page 6
Copyright page 7
Dedication 8
Table of contents 10
Foreword 20
Preface 22
Organization of the Book 23
To the Instructor 25
To the Student 26
To the Professional 26
Book Websites with Resources 27
Acknowledgments for the First Edition of the Book 28
Acknowledgments for the Second Edition of the Book 28
1 Introduction 30
1.1 What Motivated Data Mining? Why Is It Important? 30
1.2 So, What Is Data Mining? 34
1.3 Data Mining—On What Kind of Data? 38
1.4 Data Mining Functionalities—What Kinds of Patterns Can Be Mined? 50
1.5 Are All of the Patterns Interesting? 56
1.6 Classification of Data Mining Systems 58
1.7 Data Mining Task Primitives 60
1.8 Integration of a Data Mining System with a Database or Data Warehouse System 63
1.9 Major Issues in Data Mining 65
1.10 Summary 68
Exercises 69
Bibliographic Notes 71
2 Data Preprocessing 76
2.1 Why Preprocess the Data? 77
2.2 Descriptive Data Summarization 80
2.3 Data Cleaning 90
2.4 Data Integration and Transformation 96
2.5 Data Reduction 101
2.6 Data Discretization and Concept Hierarchy Generation 115
2.7 Summary 126
Exercises 126
Bibliographic Notes 130
3 Data Warehouse and OLAP Technology: An Overview 134
3.1 What Is a Data Warehouse? 134
3.2 A Multidimensional Data Model 139
3.3 Data Warehouse Architecture 156
3.4 Data Warehouse Implementation 166
3.5 From Data Warehousing to Data Mining 175
3.6 Summary 179
Exercises 181
Bibliographic Notes 183
4 Data Cube Computation and Data Generalization 186
4.1 Efficient Methods for Data Cube Computation 186
4.2 Further Development of Data Cube and OLAP Technology 218
4.3 Attribute-Oriented Induction—An Alternative Method for Data Generalization and Concept Description 227
4.4 Summary 247
Exercises 248
Bibliographic Notes 252
5 Mining Frequent Patterns, Associations, and Correlations 256
5.1 Basic Concepts and a Road Map 256
5.2 Efficient and Scalable Frequent Itemset Mining Methods 263
5.3 Mining Various Kinds of Association Rules 279
5.4 From Association Mining to Correlation Analysis 288
5.5 Constraint-Based Association Mining 294
5.6 Summary 301
Exercises 303
Bibliographic Notes 309
6 Classification and Prediction 314
6.1 What Is Classification? What Is Prediction? 314
6.2 Issues Regarding Classification and Prediction 318
6.3 Classification by Decision Tree Induction 320
6.4 Bayesian Classification 339
6.5 Rule-Based Classification 347
6.6 Classification by Backpropagation 356
6.7 Support Vector Machines 366
6.8 Associative Classification: Classification by Association Rule Analysis 373
6.9 Lazy Learners (or Learning from Your Neighbors) 376
6.10 Other Classification Methods 380
6.11 Prediction 383
6.12 Accuracy and Error Measures 388
6.13 Evaluating the Accuracy of a Classifier or Predictor 392
6.14 Ensemble Methods—Increasing the Accuracy 395
6.15 Model Selection 399
6.16 Summary 402
Exercises 404
Bibliographic Notes 407
7 Cluster Analysis 412
7.1 What Is Cluster Analysis? 412
7.2 Types of Data in Cluster Analysis 415
7.3 A Categorization of Major Clustering Methods 427
7.4 Partitioning Methods 430
7.5 Hierarchical Methods 437
7.6 Density-Based Methods 447
7.7 Grid-Based Methods 453
7.8 Model-Based Clustering Methods 458
7.9 Clustering High-Dimensional Data 463
7.10 Constraint-Based Cluster Analysis 473
7.11 Outlier Analysis 480
7.12 Summary 489
Exercises 490
Bibliographic Notes 493
8 Mining Stream, Time-Series, and Sequence Data 496
8.1 Mining Data Streams 497
8.2 Mining Time-Series Data 518
8.3 Mining Sequence Patterns in Transactional Databases 527
8.4 Mining Sequence Patterns in Biological Data 542
8.5 Summary 556
Exercises 557
Bibliographic Notes 560
9 Graph Mining, Social Network Analysis, and Multirelational Data Mining 564
9.1 Graph Mining 564
9.2 Social Network Analysis 584
9.3 Multirelational Data Mining 600
9.4 Summary 613
Exercises 615
Bibliographic Notes 616
10 Mining Object, Spatial, Multimedia, Text, and Web Data 620
10.1 Multidimensional Analysis and Descriptive Mining of Complex Data Objects 620
10.2 Spatial Data Mining 629
10.3 Multimedia Data Mining 636
10.4 Text Mining 643
10.5 Mining the World Wide Web 657
10.6 Summary 670
Exercises 671
Bibliographic Notes 674
11 Applications and Trends in Data Mining 678
11.1 Data Mining Applications 678
11.2 Data Mining System Products and Research Prototypes 689
11.3 Additional Themes on Data Mining 694
11.4 Social Impacts of Data Mining 704
11.5 Trends in Data Mining 710
11.6 Summary 713
Exercises 714
Bibliographic Notes 716
Appendix: An Introduction to Microsoft’s OLE DB for Data Mining 720
A.1 Model Creation 722
A.2 Model Training 724
A.3 Model Prediction and Browsing 726
Bibliography 732

PDFPDF (Adobe DRM)

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seiten­layout eignet sich die PDF besonders für Fach­bücher mit Spalten, Tabellen und Abbild­ungen. Eine PDF kann auf fast allen Geräten ange­zeigt werden, ist aber für kleine Displays (Smart­phone, eReader) nur einge­schränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich
Datenschutz und Sicherheit in Daten- und KI-Projekten

von Katharine Jarmul

eBook Download (2024)
O'Reilly (Verlag)
49,90