Oracle 10g Data Warehousing - Susan Hillson, Lilian Hobbs, Shilpa Lawande, Pete Smith

Oracle 10g Data Warehousing (eBook)

Susan Hillson, Lilian Hobbs, Shilpa Lawande, Pete Smith (Autoren)

eBook Download: PDF

2011 | 1. Auflage
872 Seiten
Elsevier Science (Verlag)
978-0-08-051328-7 (ISBN)

Oracle 10g Data Warehousing is a guide to using the Data Warehouse features in the latest version of Oracle -Oracle Database 10g. Written by people on the Oracle development team that designed and implemented the code and by people with industry experience implementing warehouses using Oracle technology, this thoroughly updated and extended edition provides an insider's view of how the Oracle Database 10g software is best used for your application.

It provides a detailed look at the new features of Oracle Database 10g and other Oracle products and how these are used in the data warehouse. This book will show you how to deploy the Oracle database and correctly use the new Oracle Database 10g features for your data warehouse. It contains walkthroughs and examples on how to use tools such as Oracle Discoverer and Reports to query the warehouse and generate reports that can be deployed over the web and gain better insight into your business.

This how-to guide provides step by step instructions including screen captures to make it easier to design, build and optimize performance of the data warehouse or data mart. It is a 'must have' reference for database developers, administrators and IT professionals who want to get to work now with all of the newest features of Oracle Database 10g.

It provides a detailed look at the new features of Oracle Database 10g and other Oracle products and how these are used in the data warehouse, including:
* How to use the Summary Management features, including Materialized Views and query rewrite, to best effect to radically improve query performance
* How to deploy business intelligence to the Web to satisfy today's changing and demanding business requirements
* Using Oracle OLAP and Data Mining options
* How to understand the warehouse hardware environment and how it is used by new features in the database including how to implement a high availability warehouse
environment
* Using the new management infrastructure in Oracle Database 10g and how this helps you to manage your warehouse environment

Oracle 10g Data Warehousing is a guide to using the Data Warehouse features in the latest version of Oracle -Oracle Database 10g. Written by people on the Oracle development team that designed and implemented the code and by people with industry experience implementing warehouses using Oracle technology, this thoroughly updated and extended edition provides an insider's view of how the Oracle Database 10g software is best used for your application.It provides a detailed look at the new features of Oracle Database 10g and other Oracle products and how these are used in the data warehouse. This book will show you how to deploy the Oracle database and correctly use the new Oracle Database 10g features for your data warehouse. It contains walkthroughs and examples on how to use tools such as Oracle Discoverer and Reports to query the warehouse and generate reports that can be deployed over the web and gain better insight into your business.This how-to guide provides step by step instructions including screen captures to make it easier to design, build and optimize performance of the data warehouse or data mart. It is a 'must have' reference for database developers, administrators and IT professionals who want to get to work now with all of the newest features of Oracle Database 10g.It provides a detailed look at the new features of Oracle Database 10g and other Oracle products and how these are used in the data warehouse- How to use the Summary Management features, including Materialized Views and query rewrite, to best effect to radically improve query performance- How to deploy business intelligence to the Web to satisfy today's changing and demanding business requirements- Using Oracle OLAP and Data Mining options- How to understand the warehouse hardware environment and how it is used by new features in the database including how to implement a high availability warehouse environment- Using the new management infrastructure in Oracle Database 10g and how this helps you to manage your warehouse environment

Front Cover 1
Oracle Database 10 g Data Warehousing 4
Copyright Page 5
Contents 8
Foreword 20
Preface 22
Acknowledgments 26
Chapter 1. Data Warehousing 30
1.1 An Introduction to Oracle Database 10g 30
1.2 What Is a Data Warehouse? 33
1.3 A Historical Perspective 34
1.4 Data Warehousing Features in the Oracle Database 10g 45
1.5 Building a Data Warehouse Poses Many Challenges 46
1.6 The Future of Data Warehousing 50
1.7 Summary 51
Chapter 2. Designing a Warehouse 52
2.1 Designing a Warehouse 52
2.2 Other Design Considerations 60
2.3 Implementing the Design 63
2.4 Testing the Design 99
2.5 The Schema for Easy Shopping Inc. 100
Chapter 3. Architecture of a Data Warehouse 102
3.1 Introduction 102
3.2 Hardware Configurations for a Warehouse 103
3.3 Hardware Components 114
3.4 Automatic Storage Management 123
3.5 File Management in Oracle 137
3.6 Summary 140
Chapter 4. Physical Design of the Data Warehouse 142
4.1 Introduction 142
4.2 Data Partitioning 142
4.3 Indexing 160
4.4 Index-Organized Tables 177
4.5 Data Compression 180
4.6 Summary 183
Chapter 5. Loading Data into the Warehouse 184
5.1 The ETL Process 184
5.2 Extracting Data from the Operational Systems 186
5.3 Transforming the Data into a Common Representation 205
5.4 Loading the Warehouse 210
5.5 Transformations inside the Oracle Database 261
5.6 Postload Operations 280
5.7 Using Tools for the ETL process 282
5.8 Summary 282
Chapter 6. Querying the Data Warehouse 284
6.1 Introduction 284
6.2 The Query Optimizer 284
6.3 Parallel Execution 296
6.4 SQL Features for Querying the Data Warehouse 300
6.5 Summary 349
Chapter 7. Summary Management 350
7.1 Summary Tables 350
7.2 Creating a Materialized View 354
7.3 Refresh 365
7.4 EXPLAIN_MVIEW Utility 385
7.5 TUNE_MVIEW Utility 391
7.6 Summary 394
Chapter 8. Dimensions 396
8.1 Concepts 396
8.2 Creating a Dimension 398
8.3 Describing a Dimension 408
8.4 Validating a Dimension 408
8.5 Summary 411
Chapter 9. Query Rewrite 412
9.1 Setting up Query Rewrite 412
9.2 Types of Query Rewrite 414
9.3 Query Rewrite Integrity Modes 427
9.4 Query Rewrite and Partition Change Tracking 432
9.5 Troubleshooting Query Rewrite with EXPLAIN_REWRITE 436
9.6 Advanced Query Rewrite Techniques 438
9.7 Summary 456
Chapter 10. Tuning Query Performance 458
10.1 Monitoring Performance 458
10.2 Advisor Central 464
10.3 SQL Access Advisor 465
10.4 SQL Tuning Advisor 481
10.5 Memory Advisor 488
10.6 Troubleshooting Parallel Execution 498
10.7 Plan Stability 503
10.8 Summary 505
Chapter 11. Managing the Warehouse 506
11.1 What Has to Be Managed 506
11.2 Managing Using Oracle Enterprise Manager 506
11.3 Monitoring the Warehouse 534
11.4 Reorganizing the Warehouse 544
11.5 Refreshing the Warehouse 570
11.6 Gathering Optimizer Statistics 571
11.7 Parallel Management Tasks 575
11.8 Maintaining Security 575
11.9 Monitoring Space Usage 580
11.10 Other Management Issues 585
11.11 Summary 587
Chapter 12. Backup and Recovery 588
12.1 Strategy 588
12.2 Backup 596
12.3 The Recovery Catalog 610
12.4 Restore and Recover 613
12.5 Summary 619
Chapter 13. Oracle Warehousing Tools 620
13.1 Which Tool 620
13.2 Oracle Warehouse Builder 620
13.3 Oracle Discoverer 643
13.4 Oracle Reports 10g 673
13.5 Summary 681
Chapter 14. Data Warehousing and the Web 682
14.1 Overview 682
14.2 Oracle Application Server 10g 685
14.3 Publishing Data on the Web 693
14.4 Oracle Personalization 697
14.5 The Data Warehouse and E-Business Intelligence 698
Chapter 15. OLAP 700
15.1 Why Do We Need the Oracle OLAP Option? 700
15.2 Oracle OLAP Architecture 704
15.3 Analytic Workspaces 707
15.4 The OLAP Catalog 712
15.5 The Analytic Workspace Manager 722
15.6 Querying Analytic Workspaces 736
15.7 Summary 753
Chapter 16. Oracle Data Mining 754
16.1 Oracle Database 10g Data Mining Option 755
16.2 Oracle Data Mining Techniques 756
16.3 Preparing Data for Oracle Data Mining 767
16.4 Using Oracle Data Mining Interfaces 770
16.5 Summary 784
Chapter 17. High Availability and a Data Warehouse 786
17.1 Introduction 786
17.2 What Is a Highly Available System? 787
17.3 Overview of Oracle Database 10g High Availability Features 789
17.4 Protecting against Hardware/Software Failures 790
17.5 Protecting against Data Loss 793
17.6 Managing Planned Downtime 817
17.7 Information Lifecycle Management 820
17.8 Summary 822
A The Schema for Easy Shopping Inc. 824
A.1 Creating the Tablespaces and Data Files 824
A.2 Creating the Tables, Constraints, and Indexes 826
A.3 Defining Security 830
A.4 Final Steps 831
B Product Information 832
B.1 Product Information 832
Index 834

Designing a Warehouse

2.1 Designing a Warehouse

Readers of this chapter probably fall into one of three categories. They have either:

1. Never designed a database before

2. Designed a database for a transaction processing–type system

3. Built a data warehouse system

In the latter case, you could skip this chapter or use it as a refresher, especially if your last database used Oracle. Therefore, this chapter is aimed at readers who fall into categories one or two, which may surprise the person who has previously designed a non-data warehouse database. Why? Because the skills and techniques used to create a database for a data warehouse will be different from those required for a transaction processing–type (OLTP) system. Consequently, though you will have a head start because some of the techniques are the same, it is very important to say to yourself: I am designing a different type of database.

So what is different about designing a database in a data warehouse? In a transaction-processing system, the designer’s goal is to make the transaction complete very, very quickly, and the designer also has the benefit of hopefully knowing how the business will interrogate and use the data. Typically, the data changed is just the specific individual records for the transaction, and reports only look at the current day, month, or week. Contrast that with a data warehouse, where, although queries must complete as quickly as possible, they could still take hours. In the data warehouse, a much larger volume of data, both current and historical, is typically scanned in order to fulfill the normal business intelligence types of queries.

Another major problem is determining what information should be held in the warehouse and at what level of granularity it should be retained. This book will not discuss the techniques that can be used to determine what should be included in the warehouse or how to go about collecting that data, because there are already many books available that discuss this topic extensively.

However, the importance of trying to determine what should be included in the data warehouse cannot be stressed enough. It is so important because it may not be until a year after the warehouse is in production use that you suddenly discover that the information is either not available or held at an inappropriate level, and this will limit or prohibit the types of queries that you can run on your warehouse

For example, a telephone company decides not to hold every call in its database, but instead holds a total of what the customer spent by day. Then someone in the company decides that he or she would like to offer customers a discount when certain numbers are called. Now, if the warehouse had contained every single telephone call made by its customers, the company would be able to find out exactly what this scheme would have cost if it had been implemented over the last 12 months. Instead, it has no data available and would either have to guess what the cost might be or postpone the planned new system until sufficient data is available to accurately determine the true cost to the company.

One of the difficult decisions for the designer is to determine at what level data will be stored in the warehouse. Often, storing every transaction, such as in our telephone example, may seem rather excessive, and, because it could easily mean the warehouse grows to many terabytes, there is a temptation to consolidate the data. Managing a terabyte warehouse requires careful and stringently controlled procedures that must be followed. The bigger the database becomes, the harder it is to manage and query it. However, with the easier availability of cheap storage devices, keeping vast quantities of data at the detailed level is becoming much more feasible and worthy of serious consideration.

Since aggregation is a major design decision, the designer would be wise to seek approval from the users of the warehouse before adopting such a strategy. It should also be clearly explained to these users the limitations that are likely to occur due to aggregating the data. With disks declining in price, hopefully most sites will store all of the data that they require.

2.1.1 Don′t Use Entity Relationship (E-R) Modeling

The typical approach used to construct a transaction-processing system is to construct an entity-relationship (E-R) diagram of the business. It is then ultimately used as the basis for creating the physical database design, because many of the entities in our model become tables in the database. If you have never designed a data warehouse before but are experienced in designing transaction-processing systems, then you will probably think that a data warehouse is no different from any other database and that you can use the same approach.

Unfortunately, that is not the case, and warehouse designers will quickly discover that the entity-relationship model is not really suitable for designing a data warehouse. Leading authorities on the subject, such as Ralph Kimball, advocate using the dimensional model, and we have found this approach to be ideal for a data warehouse.

An entity-relationship diagram can show us, in considerable detail, the interaction between the numerous entities in our system, removing redundancy in the system whenever possible. The result is a very flat view of the enterprise, where hundreds of entities are described along with their relationships to other entities. While this approach is fine in the transaction-processing world, where we require this level of detail, it is far too complex for the data warehouse. If you ask a database administrator (DBA) if he or she has an entity-relationship diagram, the DBA will probably respond that he or she did once, when the system was first designed. But due to its size and the numerous changes that have occurred in the system during its lifetime, the entity-relationship diagram hasn′t been updated, and it is now only partially accurate.

If we use a different approach for the data warehouse, one that results in a much simpler picture, then it should be very easy to keep it up-to-date and also to give it to end users, to help them understand the data warehouse. Another factor to consider is that entity-relationship diagrams tend to result in a normalized database design, whereas in a data warehouse, a denormalized design is often used.

2.1.2 Dimensional Modeling

An alternative to using the entity-relationship model is the dimensional model, which views and models the data from a different perspective. Instead of considering an entity, which represents a thing such as a product or a place and the relationships between those entities, a dimensional model describes data using dimensions and facts, which become actual tables in the database and which we will describe in more detail in the next two sections.

Dimensional models, as illustrated in Figure 2.1, despite sometimes looking quite simple, provide a very effective way of holding historical and current data in a form that makes it accessible to business users and that enables them to make the right business decisions. A dimensional data warehouse can be viewed as containing data that:

Figure 2.1 Dimensional Modeling

Has been validated (i.e., no invalid product codes)

Is historical (i.e., the last 36 months)

Is integrated—therefore the same key is used by all systems

Is easily accessible

2.1.3 Fact Table

The fact table, of which there could be more than one, contains factual information, and it is usually the largest table in the data warehouse and is often fast growing. The fact tables are typically where all of the detail data that you want to keep in the data warehouse is stored, such as all of the telephone calls made by a customer or the orders placed by your customer, as shown in Figure 2.1.

Therefore, if a customer made 20 telephone calls, then it is likely that 20 rows will be stored in the fact table for this customer. Consequently, the fact tables will be by far the largest tables in the database, possibly containing hundreds of millions of rows in a large data warehouse. If you are unsure as to whether data is factual, it is often numeric, and sometimes a value that can be computed, such as the value of an order or the number of items purchased.

The information contained in the fact table doesn′t have to be at the finest level of detail; it could be summarized data, such as total telephone calls made by a customer today. The level at which data is held in the fact table is known as the granularity and is one of the important decisions the warehouse designer must make. In the example described here, the difference in the number of records stored over a 24-month period would be huge. Contrast the storage requirements between a record stored for every telephone call a customer made in a single day compared with a record for every telephone call a customer makes.

When designing a data warehouse, depending on your business, you may find that there are different types of fact tables, such as, transaction level,...

Erscheint lt. Verlag	18.4.2011
Sprache	englisch
Themenwelt	Sachbuch/Ratgeber
	Informatik ► Datenbanken ► Data Warehouse / Data Mining
	Informatik ► Datenbanken ► Oracle
	Mathematik / Informatik ► Informatik ► Software Entwicklung
ISBN-10	0-08-051328-X / 008051328X
ISBN-13	978-0-08-051328-7 / 9780080513287

Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?

PDF (Adobe DRM)

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Print-Ausgabe

Buch | Softcover

82,25 €