Vijay Kotu is Vice President of Analytics at ServiceNow. He leads the implementation of large-scale data platforms and services to support the company's enterprise business. He has led analytics organizations for over a decade with focus on data strategy, business intelligence, machine learning, experimentation, engineering, enterprise adoption, and building analytics talent. Prior to joining ServiceNow, he was Vice President of Analytics at Yahoo. He worked at Life Technologies and Adteractive where he led marketing analytics, created algorithms to optimize online purchasing behavior, and developed data platforms to manage marketing campaigns. He is a member of the Association of Computing Machinery and a member of the Advisory Board at RapidMiner.
Put Predictive Analytics into ActionLearn the basics of Predictive Analysis and Data Mining through an easy to understand conceptual framework and immediately practice the concepts learned using the open source RapidMiner tool. Whether you are brand new to Data Mining or working on your tenth project, this book will show you how to analyze data, uncover hidden patterns and relationships to aid important decisions and predictions. Data Mining has become an essential tool for any enterprise that collects, stores and processes data as part of its operations. This book is ideal for business users, data analysts, business analysts, business intelligence and data warehousing professionals and for anyone who wants to learn Data Mining.You'll be able to:1. Gain the necessary knowledge of different data mining techniques, so that you can select the right technique for a given data problem and create a general purpose analytics process.2. Get up and running fast with more than two dozen commonly used powerful algorithms for predictive analytics using practical use cases.3. Implement a simple step-by-step process for predicting an outcome or discovering hidden relationships from the data using RapidMiner, an open source GUI based data mining toolPredictive analytics and Data Mining techniques covered: Exploratory Data Analysis, Visualization, Decision trees, Rule induction, k-Nearest Neighbors, Naive Bayesian, Artificial Neural Networks, Support Vector machines, Ensemble models, Bagging, Boosting, Random Forests, Linear regression, Logistic regression, Association analysis using Apriori and FP Growth, K-Means clustering, Density based clustering, Self Organizing Maps, Text Mining, Time series forecasting, Anomaly detection and Feature selection. Implementation files can be downloaded from the book companion site at www.LearnPredictiveAnalytics.com- Demystifies data mining concepts with easy to understand language- Shows how to get up and running fast with 20 commonly used powerful techniques for predictive analysis- Explains the process of using open source RapidMiner tools- Discusses a simple 5 step process for implementing algorithms that can be used for performing predictive analytics- Includes practical use cases and examples
Front Cover 1
Predictive Analyticsand Data Mining 4
Copyright 5
Dedication 6
Contents 8
Foreword 12
Preface 16
WHY THIS BOOK? 17
WHO CAN USE THIS BOOK? 17
Acknowledgments 20
Chapter 1 -Introduction 22
1.1 WHAT DATA MINING IS 23
1.2 WHAT DATA MINING IS NOT 26
1.3 THE CASE FOR DATA MINING 27
1.4 TYPES OF DATA MINING 29
1.5 DATA MINING ALGORITHMS 31
1.6 ROADMAP FOR UPCOMING CHAPTERS 32
REFERENCES 37
Chapter 2 - Data Mining Process 38
2.1 PRIOR KNOWLEDGE 40
2.2 DATA PREPARATION 43
2.3 MODELING 48
2.4 APPLICATION 53
2.5 KNOWLEDGE 55
WHAT’S NEXT? 56
REFERENCES 56
Chapter 3 - Data Exploration 58
3.1 OBJECTIVES OF DATA EXPLORATION 59
3.2 DATA SETS 59
3.3 DESCRIPTIVE STATISTICS 62
3.4 DATA VISUALIZATION 67
3.5 ROADMAP FOR DATA EXPLORATION 80
REFERENCES 81
Chapter 4 - Classification 84
4.1 DECISION TREES 85
4.2 RULE INDUCTION 109
4.3 K-NEAREST NEIGHBORS 120
4.4 NAÏVE BAYESIAN 132
4.5 ARTIFICIAL NEURAL NETWORKS 145
4.6 SUPPORT VECTOR MACHINES 155
4.7 ENSEMBLE LEARNERS 169
REFERENCES 183
Chapter 5 - Regression Methods 186
5.1 LINEAR REGRESSION 188
5.2 LOGISTIC REGRESSION 201
CONCLUSION 213
REFERENCES 213
Chapter 6 - Association Analysis 216
6.1 CONCEPTS OF MINING ASSOCIATION RULES 218
6.2 Apriori Algorithm 223
6.3 FP-GROWTH ALGORITHM 227
CONCLUSION 236
REFERENCES 236
Chapter 7 - Clustering 238
CLUSTERING TO DESCRIBE THE DATA 238
CLUSTERING FOR PREPROCESSING 239
7.1 TYPES OF CLUSTERING TECHNIQUES 240
7.2 K-MEANS CLUSTERING 244
7.3 DBSCAN CLUSTERING 255
7.4 SELF-ORGANIZING MAPS 263
REFERENCES 275
Chapter 8 - Model Evaluation 278
8.1 CONFUSION MATRIX (OR TRUTH TABLE) 279
8.2 RECEIVER OPERATOR CHARACTERISTIC (ROC) CURVES AND AREA UNDER THE CURVE (AUC) 281
8.3 LIFT CURVES 284
8.4 EVALUATING THE PREDICTIONS: IMPLEMENTATION 285
CONCLUSION 294
REFERENCES 294
Chapter 9 - Text Mining 296
9.1 HOW TEXT MINING WORKS 298
9.2 IMPLEMENTING TEXT MINING WITH CLUSTERING AND CLASSIFICATION 305
CONCLUSION 323
REFERENCES 323
Chapter 10 - Time Series Forecasting 326
10.1 DATA-DRIVEN APPROACHES 329
10.2 MODEL-DRIVEN FORECASTING METHODS 334
CONCLUSION 347
REFERENCES 348
Chapter 11 - Anomaly Detection 350
11.1 ANOMALY DETECTION CONCEPTS 350
11.3 DENSITY-BASED OUTLIER DETECTION 359
11.4 LOCAL OUTLIER FACTOR 362
CONCLUSION 365
REFERENCES 366
Chapter 12 - Feature Selection 368
12.1 CLASSIFYING FEATURE SELECTION METHODS 369
12.2 PRINCIPAL COMPONENT ANALYSIS 370
12.3 INFORMATION THEORY–BASED FILTERING FOR NUMERIC DATA 379
CATEGORICAL DATA 381
12.5 WRAPPER-TYPE FEATURE SELECTION 384
CONCLUSION 391
REFERENCES 391
Chapter 13 - Getting Started with RapidMiner 392
13.1 USER INTERFACE AND TERMINOLOGY 393
13.2 DATA IMPORTING AND EXPORTING TOOLS 398
13.3 DATA VISUALIZATION TOOLS 403
13.4 DATA TRANSFORMATION TOOLS 407
13.5 SAMPLING AND MISSING VALUE TOOLS 413
CONCLUSION 426
REFERENCES 427
Comparison of Data Mining Algorithms 428
Index 438
A 438
B 439
C 439
D 439
E 440
F 440
G 441
H 441
I 441
K 441
L 441
M 442
Q 442
R 442
S 443
T 443
U 444
V 444
W 444
Y 444
About the Authors 446
Data Mining Process
Abstract
Successfully uncovering patterns using data mining is an iterative process. Chapter 2 provides a framework to solve the data mining problem. The five-step process outlined in this chapter provides guidelines on gathering subject matter expertise; exploring the data with statistics and visualization; building a model using data mining algorithms; testing the model and deploying it in a production environment; and finally reflecting on new knowledge gained in the cycle. Over the years of evolution of data mining practices, different frameworks for the data mining process have been put forward by various academic and commercial bodies, like the Cross Industry Standard Process for Data Mining, knowledge discovery in databases, etc. These data mining frameworks exhibit common characteristics and hence we will be using a generic framework closely resembling the CRISP process.
Keywords
CRISP; KDD; data mining process; prior knowledge; modeling; data preparation; evaluation; application
Figure 2.1 CRISP data mining framework.
Figure 2.2 Data mining process.
2.1. Prior Knowledge
2.1.1. Objective
2.1.2. Subject Area
Erscheint lt. Verlag | 27.11.2014 |
---|---|
Sprache | englisch |
Themenwelt | Mathematik / Informatik ► Informatik ► Betriebssysteme / Server |
Informatik ► Datenbanken ► Data Warehouse / Data Mining | |
Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik | |
ISBN-10 | 0-12-801650-7 / 0128016507 |
ISBN-13 | 978-0-12-801650-3 / 9780128016503 |
Haben Sie eine Frage zum Produkt? |
Größe: 41,0 MB
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
Größe: 25,5 MB
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich