Making Sense of Data

A Practical Guide to Exploratory Data Analysis and Data Mining

Glenn J. Myatt (Autor)

Buch | Softcover

292 Seiten

2006
Wiley-Blackwell (Verlag)
978-0-470-07471-8 (ISBN)

Titel erscheint in neuer Auflage

Artikel merken

Zu diesem Artikel existiert eine Nachauflage

Making Sense of Data I

Glenn J. Myatt, Wayne P. Johnson

2014

Buch | Softcover

83, ⁴¹ €

zur Neuauflage

Making Sense of Data covers a series of technical topics that address how to analyze data. The book is divided into five parts: Introduction; Tables, Graphs, and Statistics; Grouping; Predictions; and Conclusions.

A practical, step-by-step approach to making sense out of data Making Sense of Data educates readers on the steps and issues that need to be considered in order to successfully complete a data analysis or data mining project. The author provides clear explanations that guide the reader to make timely and accurate decisions from data in almost every field of study. A step-by-step approach aids professionals in carefully analyzing data and implementing results, leading to the development of smarter business decisions. With a comprehensive collection of methods from both data analysis and data mining disciplines, this book successfully describes the issues that need to be considered, the steps that need to be taken, and appropriately treats technical topics to accomplish effective decision making from data.
Readers are given a solid foundation in the procedures associated with complex data analysis or data mining projects and are provided with concrete discussions of the most universal tasks and technical solutions related to the analysis of data, including: Problem definitions Data preparation Data visualization Data mining Statistics Grouping methods Predictive modeling Deployment issues and applications Throughout the book, the author examines why these multiple approaches are needed and how these methods will solve different problems. Processes, along with methods, are carefully and meticulously outlined for use in any data analysis or data mining project. From summarizing and interpreting data, to identifying non-trivial facts, patterns, and relationships in the data, to making predictions from the data, Making Sense of Data addresses the many issues that need to be considered as well as the steps that need to be taken to master data analysis and mining.

GLENN J. MYATT, PhD, is cofounder of Leadscope, Inc., a data mining company providing solutions to the pharmaceutical and chemical industry. He has also acted as a part-time lecturer in chemoinformatics at The Ohio State University and has held a series of industrial and academic research positions. Dr. Myatt is the author of numerous journal articles.

Preface. 1. Introduction. 1.1 Overview. 1.2 Problem definition. 1.3 Data preparation. 1.4 Implementation of the analysis. 1.5 Deployment of the results. 1.6 Book outline. 1.7 Summary. 1.8 Further reading. 2. Definition. 2.1 Overview. 2.2 Objectives. 2.3 Deliverables. 2.4 Roles and responsibilities. 2.5 Project plan. 2.6 Case study. 2.6.1 Overview. 2.6.2 Problem. 2.6.3 Deliverables. 2.6.4 Roles and responsibilities. 2.6.5 Current situation. 2.6.6 Timetable and budget. 2.6.7 Cost/benefit analysis. 2.7 Summary. 2.8 Further reading. 3. Preparation. 3.1 Overview. 3.2 Data sources. 3.3 Data understanding. 3.3.1 Data tables. 3.3.2 Continuous and discrete variables. 3.3.3 Scales of measurement. 3.3.4 Roles in analysis. 3.3.5 Frequency distribution. 3.4 Data preparation. 3.4.1 Overview. 3.4.2 Cleaning the data. 3.4.3 Removing variables. 3.4.4 Data transformations. 3.4.5 Segmentation. 3.5 Summary. 3.6 Exercises. 3.7 Further reading. 4. Tables and graphs. 4.1 Introduction. 4.2 Tables. 4.2.1 Data tables. 4.2.2 Contingency tables. 4.2.3 Summary tables. 4.3 Graphs. 4.3.1 Overview. 4.3.2 Frequency polygrams and histograms. 4.3.3 Scatterplots. 4.3.4 Box plots. 4.3.5 Multiple graphs. 4.4 Summary. 4.5 Exercises. 4.6 Further reading. 5. Statistics. 5.1 Overview. 5.2 Descriptive statistics. 5.2.1 Overview. 5.2.2 Central tendency. 5.2.3 Variation. 5.2.4 Shape. 5.2.5 Example. 5.3 Inferential statistics. 5.3.1 Overview. 5.3.2 Confidence intervals. 5.3.3 Hypothesis tests. 5.3.4 Chi-square. 5.3.5 One-way analysis of variance. 5.4 Comparative statistics. 5.4.1 Overview. 5.4.2 Visualizing relationships. 5.4.3 Correlation coefficient (r). 5.4.4 Correlation analysis for more than two variables. 5.5 Summary. 5.6 Exercises. 5.7 Further reading. 6. Grouping. 6.1 Introduction. 6.1.1 Overview. 6.1.2 Grouping by values or ranges. 6.1.3 Similarity measures. 6.1.4 Grouping approaches. 6.2 Clustering. 6.2.1 Overview. 6.2.2 Hierarchical agglomerative clustering. 6.2.3 K-means clustering. 6.3 Associative rules. 6.3.1 Overview. 6.3.2 Grouping by value combinations. 6.3.3 Extracting rules from groups. 6.3.4 Example. 6.4 Decision trees. 6.4.1 Overview. 6.4.2 Tree generation. 6.4.3 Splitting criteria. 6.4.4 Example. 6.5 Summary. 6.6 Exercises. 6.7 Further reading. 7. Prediction. 7.1 Introduction. 7.1.1 Overview. 7.1.2 Classification. 7.1.3 Regression. 7.1.4 Building a prediction model. 7.1.5 Applying a prediction model. 7.2 Simple regression models. 7.2.1 Overview. 7.2.2 Simple linear regression. 7.2.3 Simple nonlinear regression. 7.3 K-nearest neighbors. 7.3.1 Overview. 7.3.2 Learning. 7.3.3 Prediction. 7.4 Classification and regression trees. 7.4.1 Overview. 7.4.2 Predicting using decision trees. 7.4.3 Example. 7.5 Neural networks. 7.5.1 Overview. 7.5.2 Neural network layers. 7.5.3 Node calculations. 7.5.4 Neural network predictions. 7.5.5 Learning process. 7.5.6 Backpropagation. 7.5.7 Using neural networks. 7.5.8 Example. 7.6 Other methods. 7.7 Summary. 7.8 Exercises. 7.9 Further reading. 8. Deployment. 8.1 Overview. 8.2 Deliverables. 8.3 Activities. 8.4 Deployment scenarios. 8.5 Summary. 8.6 Further reading. 9. Conclusions. 9.1 Summary of process. 9.2 Example. 9.2.1 Problem overview. 9.2.2 Problem definition. 9.2.3 Data preparation. 9.2.4 Implementation of the analysis. 9.2.5 Deployment of the results. 9.3 Advanced data mining. 9.3.1 Overview. 9.3.2 Text data mining. 9.3.3 Time series data mining. 9.3.4 Sequence data mining. 9.4 Further reading. Appendix A Statistical tables. A.1 Normal distribution. A.2 Student's t-distribution. A.3 Chi-square distribution. A.4 F-distribution. Appendix B Answers to exercises. Glossary. Bibliography. Index.

Erscheint lt. Verlag	15.12.2006
Zusatzinfo	Illustrations
Verlagsort	Hoboken
Sprache	englisch
Maße	161 x 234 mm
Gewicht	430 g
Einbandart	Paperback
Themenwelt	Mathematik / Informatik ► Mathematik ► Angewandte Mathematik
Themenwelt	Mathematik / Informatik ► Mathematik ► Wahrscheinlichkeit / Kombinatorik
ISBN-10	0-470-07471-X / 047007471X
ISBN-13	978-0-470-07471-8 / 9780470074718
Zustand	Neuware