Data Warehousing - Reema Thareja

Data Warehousing

(Autor)

Buch | Softcover
456 Seiten
2009
OUP India (Verlag)
978-0-19-569961-6 (ISBN)
33,65 inkl. MwSt
Data Warehousing is designed to serve as a textbook for students of Computer Science & Engineering (BE/Btech), computer applications (BCA/MCA) and computer science (B.Sc) for an introductory course on Data Warehousing. It provides a thorough understanding of the fundamentals of Data Warehousing and aims to impart a sound knowledge to users for creating and managing a Data Warehouse.

The book introduces the various features and architecture of a Data Warehouse followed by a detailed study of the Business Requirements and Dimensional Modelling. It goes on to discuss the components of a Data Warehouse and thereby leads up to the core area of the subject by providing a thorough understanding of the building and maintenance of a Data Warehouse. This is then followed up by an overview of planning and project management, testing and growth and then finishing with Data Warehouse solutions and the latest trends in this field. The book is finally rounded off with a broad overview of its related field of study, Data Mining.

The text is ably supported by plenty of examples to illustrate cocepts and contains several review questions and other end-chapter exercises to test the understanding of students. The book also carries a running case study that aims to bring out the practical aspects of the subject. This will be useful for students to master the basics and apply them to real-life scenario.

Reema Thareja was until recently working as an IT Lecturer at the Institute of Information and Technology, an affiliate of GGS Indraprastha University, New Delhi. She has completed her MCA from the same University and specializes in Programming Languages, OS, DBMS, Multimedia and Web Technologies.

1. The Compelling Need for Data Warehousing ; Learning Objective ; Case Study ; 1.1 A Short Historical Note ; 1.2 Need for Data Warehousing ; 1.2.1 Increasing Demand for Strategic Information ; 1.2.2 The Information Crisis ; 1.2.3 Inability of Past Decision Support System ; 1.2.4 Presence of Better Technology ; 1.2.5 Expectations from the New Kind of Decision Support System ; 1.2.6 Operational Vs Decisional Support System ; 1.3 Data Warehouse Defined ; 1.3.1 What can a Data Warehouse Do? ; 1.3.2 What Data Warehouse cannot do? ; 1.3.3 What is a Data Warehouse- an Environment or a Product? ; 1.3.4 A Blend of Many Technologies ; 1.4 Data Warehouse Users ; 1.4.1 Why do they want Information? ; 1.5 Benefits of Data Warehousing ; 1.5.1 Tangible Benefits ; 1.6 Concerns in Data Warehousing ; 1.6.1 Nothing is for free ; Summary ; Review Questions ; 2. Data Warehouse: Defining Features ; Learning Objectives ; Case Study ; 2.1 Introduction ; 2.2 Features of a Data Warehouse ; 2.2.1 Subject Oriented Data ; 2.2.2 Integrated Data ; 2.2.2.1 Data Cleansing ; 2.2.2.2 Data Transformation ; 2.2.2.3 Non-Volatile Data ; 2.2.2.4 Time Variant Data ; 2.3 Data Granularity ; 2.3.1 Benefits of Data Granularity ; 2.3.2 Data granularity - Pros and Cons ; 2.3.3 Dual Levels of Data Granularity ; 2.4 The Information Flow Mechanism ; 2.5 Metadata ; 2.5.1 Role of Metadata ; 2.5.2 Classification of Metadata ; 2.5.3 Metadata is the Nerve Centre of the Data Warehouse ; 2.5.4 Metadata Management ; 2.6 Two Classes of Data ; 2.7 Life Cycle of Data ; 2.7.1 What is Data Velocity? ; 2.7.2 Moving Data from One Medium to Another ; 2.7.3 Inverted Data Warehouse ; 2.8 Can Data Move from Data Warehouse to the Operational Systems? ; 2.8.1 Direct Access Mode ; 2.8.2 Indirect Access Mode ; Summary ; Review Questions ; 3. Physical Architecture of a Data Warehouse and Data Mart Issues ; Learning Objectives ; Case Study ; 3.1 Introduction ; 3.2 Distinguishing Characteristics of Data Warehouse Architecture ; 3.3 Data Warehouse Architectural Goals ; 3.4 Data Warehouse Architecture ; 3.4.1 Pros and Cons of Data Warehouse Architecture ; 3.4.2 The Two Tier Architecture ; 3.4.3 The Three Tier Architecture ; 3.4.4 The Four Tier Architecture ; 3.4.5 Three Tier Versus Two Tier Architecture ; 3.4.6 Architecture Considerations and Challenges ; 3.4.7 Interfacing ; 3.5 Data Warehouse and Data Marts ; 3.6 Issues in Building Data Marts ; 3.6.1 A Change of Approaches ; 3.6.2 How Are Data Warehouse Different From Data Marts ; 3.6.3 Reasons for Creating Data Marts ; 3.6.4 Advantages of Building a Data Mart ; 3.6.5 Limitations of Building a Data Mart ; 3.7 Building Data Marts ; 3.8 Other Data Mart Issues ; 3.8.1 Types of Data Marts Based on Underlying DBMS ; 3.8.2 Loading of Data Marts ; 3.8.2.1 The Types of Data Marts to Load ; 3.8.2.2 Loading Temporal Data Marts ; 3.8.2.3 Loading of Non- Temporal Data Marts ; 3.8.3 Metadata for a Data Mart ; 3.8.4 Maintenance of a Data mart ; 3.8.5 Nature of data in a Data Mart ; 3.8.6 Software Components of a Data Mart ; 3.8.7 Performance Issues ; 3.8.8 Monitoring Requirements for a Data Mart ; 3.8.9 Security In A Data Mart ; 3.8.10 Structure of a Data Mart ; 3.9 Reasons for Increased Popularity of Data Marts ; 3.10 Can We Have the Data Warehouse and Data Marts on the Same Processor? ; 3.11 Pushing and Pulling Data ; Summary ; Review Questions ; 4. Gathering the Business Requirements ; Learning Objective ; Case Study ; 4.1 Introduction ; 4.2 Determining the End User Requirements ; 4.2.1 Business Objectives ; 4.2.2 Business Queries ; 4.2.3 Determining the Functional Requirements ; 4.2.4 Information Infrastructure Environment ; 4.2.5 The Data Quality Levels ; 4.3 Requirements Gathering Methods ; 4.3.1 Interviews ; 4.3.2 JAD Methodology ; 4.3.3 Review of Existing Documentation ; 4.3.4 Brainstorming ; 4.3.5 Questionnaires ; 4.3.6 Where to Stop? ; 4.4 Requirements Analysis ; 4.4.1 Requirements Definition Document ; 4.5 Gathering Requirements for a Data Warehouse Project ; 4.6 Dimensional Analysis ; 4.6.1 Business Dimensions ; 4.6.2 Dimension Hierarchies/Categories ; 4.6.3 Facts or Metrics ; 4.6.4 Example ; 4.7 Information Package Diagram ; 4.7.1 What Information does an IPD contain? ; 4.7.2 Example ; 4.7.3 Reason for Forming IPD ; Summary ; Review questions ; 5. Planning and Project Management In A Data Warehouse ; Learning Objective ; Case Study ; 5.1 The Project Management Principles ; 5.1.1 Key Considerations ; 5.1.2 The Ideal Approach ; 5.2 Data Warehouse Readiness Assessment ; 5.2.1 Bad Performance Indicators ; 5.2.2 Indications for a Successful Data Warehouse Project ; 5.3 The Data Warehouse Project Team ; 5.3.1 Key Roles ; 5.3.2 User Involvement ; 5.4 Planning for the Data Warehouse ; 5.4.1 Gathering the Business Requirements ; 5.4.2 Gaining Support for the Project ; 5.5 The Data Warehouse Project Plan ; 5.6 Economic Feasibility Analysis ; 5.6.1 Costs and Benefits of the System ; 5.6.2 Economic Feasibility Measures ; 5.6.3 Justifying the New System ; 5.7 Planning For a Data Warehouse Server ; 5.7.1 SMP ; 5.7.2 Clusters ; 5.7.3 MMP ; 5.7.4 ccNUMA ; 5.8 Capacity Planning ; 5.8.1 Estimating the Load ; 5.8.2 Estimating the CPU Bandwidth ; 5.8.3 Estimating the Memory ; 5.8.4 Estimating the Disk ; 5.9 Selecting the Operating System for the Data Warehouse ; 5.10 Selecting the Database Software ; 5.10.1 Difference between General DBMS and Data Warehouse DBMS ; 5.10.2 How to Choose? ; 5.11 Selection of Tools ; 5.11.1 Information Delivery Tools ; 5.11.1.1 The Tool Selection Technique ; 5.11.1.2 Criteria for Selecting the Information Delivery Tool ; 5.11.2 Query Tools ; 5.11.3 Browser Tools ; 5.11.4 Metadata Tools ; 5.15.5 Data Quality Tools ; Summary ; Review Questions ; 6. Data Warehouse Schema ; 6.1 Introduction ; 6.2 Building the Fact Tables and Dimension Tables ; 6.2.1 The Traditional Approach ; 6.3 Dimensional Modeling ; 6.3.1 Data Warehouse Modeling Vs Operational Database Modeling ; 6.3.2 Dimensional Model Vs ER Model ; 6.3.3 The Need for Dimension Model ; 6.3.4 Features of a Good Dimensional Model ; 6.4 The Star Schema ; 6.4.1 How Does a Query Execute? ; 6.4.2 Example ; 6.4.3 Pros and Cons of the Star Schema ; 6.5 The Snowflake Schema ; 6.5.1 The Technique ; 6.5.2 Example ; 6.5.3 Is Snowflaking Really Helpful? ; 6.5.4 Pros and Cons of the Snowflake Schema ; 6.6 Aggregate Tables ; 6.6.1 Need for Building Aggregate Fact Tables ; 6.6.2 Limitations of Aggregate Tables ; 6.7 Fact Constellation Schema or Families of Star ; 6.7.1 Pre-requisite for a Fact Constellation Schema ; 6.7.2 Pros and Cons of Fact Constellation Schema ; 6.8 Strengths of Dimensional Modeling ; 6.9 Data Warehouse and the Data Model ; Summary ; Review Questions ; 7. Fact Tables and Dimension Tables: Miscellaneous Issues ; Learning Objective ; Case Study ; 7.1 Characteristics of a Dimension Table ; 7.2 Characteristics of a Fact Table ; 7.3 The Factless Fact Table ; 7.4 Updates To Dimension Tables ; 7.4.1 Slowly Changing Dimensions ; 7.4.1.1 Type 1 Changes ; 7.4.1.2 Type 2 Changes ; 7.4.1.3 Type 3 Changes ; 7.4.1.4 Example ; 7.5 Cyclicity of Data - Wrinkle of Time ; 7.6 Other Types of Dimension Tables ; 7.6.1 Large Dimension Tables ; 7.6.2 Rapidly Changing or Large Slowly Changing Dimensions ; 7.6.3 Junk Dimensions ; 7.7 Keys in the Data Warehouse Schema ; 7.7.1 Primary Keys ; 7.7.2 Surrogate Keys ; 7.7.3 Foreign Keys ; 7.8 Enhancing the Data Warehouse Performance ; 7.8.1 Table Compression ; 7.8.2 Parallel Execution ; 7.8.3 Table Partitioning ; 7.8.3.1 The Partitioning Technique ; 7.8.3.2 Advantages of Partitioning ; 7.8.4 Data Clustering ; 7.8.5 Data Summarization ; 7.8.6 Bypassing the Referential Integrity Checks ; 7.8.7 Indexing the Data Warehouse ; 7.9 Data Warehousing and the Technology ; Summary ; Review Questions ; 8. THE ETL PROCESS ; Learning Objective ; Case Study ; 8.1 Introduction ; 8.1.1 Challenges in ETL Functions ; 8.2 Data Extraction ; 8.2.1 Identification of Data Sources ; 8.2.2 Extracting Data for Data Warehouse Refreshing ; 8.2.2.1 Immediate Data Extraction Technique ; 8.2.2.2 Deferred Data Extraction Technique ; 8.2.2.3 Evaluation of Extraction Techniques ; 8.2.3 Managing Reference Tables in a Data Warehouse ; 8.3 Data Transformation ; 8.3.1 Tasks Involved in Data Transformation ; 8.3.2 Role of Data Transformation Process ; 8.4 Data Loading ; 8.4.1 Techniques of Data Loading ; 8.4.2 When should we go for Data Update rather than Data Refresh? ; 8.4.3 Loading the Fact Tables and Dimension Tables ; 8.5 Data Quality ; 8.5.1 The Need for Data Quality ; 8.5.2 Categories of Errors Which Effect data Quality ; 8.5.2.1 Incomplete Errors ; 8.5.2.2 Incorrect Errors ; 8.5.2.3 Incomprehensibility Errors ; 8.5.2.4 Inconsistency Errors ; 8.5.3 Issues in Data Cleansing ; 8.5.4 Conclusion about Data Quality ; Summary ; Review Questions ; 9. Testing, Growth and Maintenance Of Data Warehouse ; Learning Objective ; Case Study ; 9.1 Data Warehouse Design Review ; 9.1.1 Contents of a Typical Design Review ; 9.2 Developing the Data Warehouse Iteratively ; 9.3 Testing ; 9.3.1 Testing the Data Warehouse ; 9.3.2 Developing the Test Plan ; 9.3.3 Testing the Backup and Recovery Processes ; 9.3.4 Testing the Data Warehouse Environment ; 9.3.5 Testing the Database ; 9.3.6 Logging of Test Results ; 9.4 Monitoring the Data Warehouse ; 9.4.1 Why Are Statistics Monitored? ; 9.5 Tuning the Data Warehouse ; 9.5.1 Tuning the Data Load ; 9.5.2 Tuning Queries ; 9.6 The Feedback Loop ; Summary ; Review Questions ; 10. OLAP in the Data Warehouse ; Learning Objective ; Case Study ; 10.1 Need for Online Analytical Processing ; 10.1.1 Multi Dimensional Analysis ; 10.1.2 Fast Access and Powerful Calculations ; 10.2 OLAP ; 10.2.1 OLAP Defined ; 10.2.2 OLAP is a Data Warehouse Tool ; 10.3 OLAP and Multidimensional Analysis ; 10.3.1 The Multi-Dimensional Logical Data Model ; 10.3.2 Multi Dimensional Model's Users ; 10.3.3 The Multi Dimensional Structure ; 10.3.4 Multi- Dimensional Operations ; 10.3.5 The Business Need ; 10.4 OLAP Functions ; 10.4.1 Dimensional Analysis ; 10.4.2 Hypercubes ; 10.4.3 OLAP Operations in Multidimensional Data Model ; 10.5 OLAP Applications ; 10.5.1 Integrating OLAP with GIS ; 10.6 OLAP Models ; 10.6.1 MOLAP ; 10.6.2 ROLAP ; 10.6.3 HOLAP ; 10.6.4 DOLAP ; 10.6.5 OLAP Survey ; 10.6.6 OLAP Trends ; 10.7 OLAP Design Considerations ; 10.8 OLAP Tools and Products ; 10.8.1 Report Scheduling and Sharing ; 10.8.2 Ad hoc Reporting ; 10.8.3 OLAP Customization ; 10.8.4 The Human Angle ; 10.9 Existing OLAP Tools ; 10.9.1 Spreadsheet OLAP Clients ; 10.9.2 Other OLAP Clients ; 10.9.3 Embedded OLAP ; 10.10 Data Design ; 10.10 Administration and Performance ; 10.11 OLAP Platforms ; Summary ; Review Questions ; 11. Overview of Building and Maintaining A Data Warehouse ; Learning Objective ; Case Study ; 11.1 Problem Definition ; 11.2 Critical Success Factors ; 11.3 Requirement Analysis ; 11.4 Planning for the Data Warehouse ; 11.4.1 Project Staff ; 11.4.2 Project Plan ; 11.4.3 Outsourcing Vs Custom Planning ; 11.4.4 Detailed Project Plan ; 11.5 Data Warehouse Design Stage ; 11.5.1 Design the Dimensional Model ; 11.5.2 Develop the Architecture ; 11.5.3 Design for Update and Expansion ; 11.5.4 Design the Relational Database and OLAP Cubes ; 11.5.5 Decisions in Design ; 11.5.6 Detail Design ; 11.5.7 Other Design Considerations ; 11.6 Building and Implementing Data Marts ; 11.7 Building Data Warehouse ; 11.7.1 Test and Deploy the System ; 11.7.2 Transition to Production ; 11.7.3 User Training and Support ; 11.7.3.1 The Success Factors of a Training Program ; 11.7.3.2 Issues in User Support ; 11.8 Backup and Recovery ; 11.9 Establish the Data Quality Framework ; 11.9.1 Data Purification Process ; 11.10 Security Issues in a Data Warehouse ; 11.11 Operating the Data Warehouse ; 11.11.1 Day-to-Day Operations of the Data Warehouse ; 11.11.2 Administering the Data Warehouse ; 11.11.3 Overnight Processing ; 11.12 Recipe for a Successful Data Warehouse ; 11.13 Data Warehouse Pitfalls ; Summary ; Review Questions ; 12. Data Mining Basics ; Learning Objective ; Case Study ; 12.1 Introduction ; 12.1.1 What Is Data Mining ; 12.1.2 Foundation of Data Mining ; 12.1.3 An Analogy ; 12.1.4 What Can Be Discovered ; 12.1.5 What Type of Data Can Be Mined ; 12.2 Architecture of Data Mining System ; 12.3 The KDD Process ; 12.4 Integrating Data Mining and the Data Warehouse ; 12.4.1 KDD versus Data Mining ; 12.4.2 DBMS versus Data Mining ; 12.4.3 OLAP versus Data Mining ; 12.5 Related Areas of Data Mining ; 12.6 Data Mining Techniques ; 12.6.1 Association Rule Mining ; 12.6.2 Decision Tress ; 12.6.3 Clustering Analysis ; 12.6.4 Memory Based Reasoning ; 12.6.5 Genetic Algorithm ; 12.6.6 Neural networks ; 12.6.7 Outlier Analysis ; Summary ; Review Questions ; 13. Moving into Data Mining ; Learning Objective ; Case Study ; 13.1 Introduction ; 13.2 How Do We Categorize Data Mining System ; 13.3 Is all that is Discovered Interesting and Useful ; 13.4 Applications of Data Mining ; 13.4.1 Benefits of Data Mining ; 13.4.2 Data Mining For Retail Industry ; 13.4.3 Data Mining For Telecommunication Industry ; 13.4.4 Data Mining For Banking and Finance ; 13.4.5 Data Mining For Biomedical and DNA Data Analysis ; 13.4.6 Data Mining For Customer Retention ; 13.4.7 Data Mining For Targeted Marketing ; 13.4.8 Data Mining For Customer Relationship Management ; 13.5 Other Data Mining Application Areas ; 13.6 Advantages and Disadvantages of Data Mining ; 13.7 Web Mining ; 13.7.1 Web Content Mining ; 13.7.2 Web Structure Mining ; 13.7.3 Web Usage Mining ; 13.8 Text Mining ; 13.9 Temporal Data Mining ; 13.10 Sequence Mining ; 13.11 Time Series Analysis ; 13.12 Spatial Data Mining ; 13.13 Issues and Challenges in Data Mining ; 13.14 Current Trends Affecting Data Mining ; Summary ; Review Questions ; 14. Trends In Data Warehousing ; Learning Objective ; Case Study ; 14.1 Introduction ; 14.2 Data Warehouse Solutions ; 14.2.1 Data Warehouse Implementation Alternatives ; 14.2.2 Host-Based Data Warehouses ; 14.2.2.1 Single host Based Data Warehouses ; 14.2.2.2 Host Based Single Stage (LAN)-Based Data Warehouses ; 14.2.3 LAN- Based Workgroup Data Warehouses ; 14.2.4 Multistage Data Warehouses ; 14.2.5 Stationary Data Warehouses ; 14.3 Web Enabled Data Warehouse ; 14.3.1 Using the Web for Information Delivery ; 14.3.2 Expectations from the Web as an Information Delivery Medium ; 14.3.3 Super Growth Problem ; 14.3.4 Data Webhouse Prominent Features ; 14.3.5 The Need for Data Webhouse ; 14.3.6 The Data Webhouse Architecture ; 14.3.7 Similarities with Traditional Data Warehouses ; 14.3.8 Building Clickstream Data Webhouse ; 14.3.9 The Granularity Manager ; 14.3.10 Challenges in the Clickstream Data Webhouse Lifecycle ; 14.4 Distributed Data Warehouses ; 14.4.1 Advantages of Distributed Data Warehousing ; 14.4.2 Distributed versus Centralized Warehouse ; 14.5 The Virtual Data Warehouse ; 14.5.1 Why to Go For a Virtual Data Warehouse ; 14.5.2 Problems with a Virtual Data Warehouse ; 14.5.3 Advantages of Using a Virtual Data Warehouse ; 14.6 Data Warehouse and the ODS ; 14.7 Integration of Data Warehousing with other Technologies ; 14.7.1 Data Warehousing and ERP ; 14.7.1.1 Integrating ERP and Data Warehouse ; 14.7.1.2 Issues in integrating ERP with Data Warehousing ; 14.7.1.3 Common Misconceptions about DW and ERP ; 14.7.1.4 Conclusion ; 14.7.2 Data Warehousing and Knowledge Management ; 14.7.3 Data Warehousing and EIS ; 14.7.3.1 Executive information System ; 14.7.3.2 Data Warehouse as a Basis for EIS ; 14.7.4 Data Warehousing and CRM ; 14.7.4.1 Active Data Warehousing ; 14.8 Trends in Data Warehousing ; 14.8.1 Multiple Data Types ; 14.8.2 Data Visualization ; 14.8.3 Parallel Processing ; 14.8.4 Agent Technology ; 14.9 Data Warehouse Futures ; Summary ; Review Questions ; Appendix ; Glossary

Zusatzinfo 160 line drawings
Verlagsort New Delhi
Sprache englisch
Maße 185 x 243 mm
Gewicht 713 g
Themenwelt Geisteswissenschaften Sprach- / Literaturwissenschaft Sprachwissenschaft
Informatik Datenbanken Data Warehouse / Data Mining
Mathematik / Informatik Informatik Theorie / Studium
ISBN-10 0-19-569961-0 / 0195699610
ISBN-13 978-0-19-569961-6 / 9780195699616
Zustand Neuware
Haben Sie eine Frage zum Produkt?
Mehr entdecken
aus dem Bereich
Datenanalyse für Künstliche Intelligenz

von Jürgen Cleve; Uwe Lämmel

Buch | Softcover (2024)
De Gruyter Oldenbourg (Verlag)
74,95
Auswertung von Daten mit pandas, NumPy und IPython

von Wes McKinney

Buch | Softcover (2023)
O'Reilly (Verlag)
44,90