Semistructured Database Design (eBook)
XVI, 178 Seiten
Springer US (Verlag)
978-0-387-23568-4 (ISBN)
Semistructured Database Design provides an essential reference for anyone interested in the effective management of semsistructured data. Since many new and advanced web applications consume a huge amount of such data, there is a growing need to properly design efficient databases.
This volume responds to that need by describing a semantically rich data model for semistructured data, called Object-Relationship-Attribute model for Semistructured data (ORA-SS). Focusing on this new model, the book discuss problems and present solutions for a number of topics, including schema extraction, the design of non-redundant storage organizations for semistructured data, and physical semsitructured database design, among others.
Semistructured Database Design, presents researchers and professionals with the most complete and up-to-date research in this fast-growing field.
Semistructured Database Design provides an essential reference for anyone interested in the effective management of semsistructured data. Since many new and advanced web applications consume a huge amount of such data, there is a growing need to properly design efficient databases.This volume responds to that need by describing a semantically rich data model for semistructured data, called Object-Relationship-Attribute model for Semistructured data (ORA-SS). Focusing on this new model, the book discusses problems and presents solutions for a number of topics, including schema extraction, the design of non-redundant storage organizations for semistructured data, and physical semistructured database design, among others.Semistructured Database Design presents researchers and professionals with the most complete and up-to-date research in this fast-growing field.
Contents 6
List of Figures 10
List of Tables 14
Preface 15
1 INTRODUCTION 17
1.1 Chapter Overview 19
2 DATA MODELS FOR SEMISTRUCTURED DATA 23
2.1 Document Type Definition 24
2.2 DOM, OEM and DataGuide 28
2.3 S3-graph 32
2.4 CM Hypergraph and Scheme Tree 34
2.5 EER and XGrammar 37
2.6 AL-DTD and XML Tree 40
2.7 ORA-SS 44
2.8 Discussion 48
3 ORA-SS 53
3.1 ORA-SS Schema Diagram 53
3.2 ORA-SS Data Instance Diagram 65
3.3 ORA-SS Functional Dependency Diagram 68
3.4 ORA-SS Inheritance Hierarchy Diagram 71
3.5 Discussion 73
4 SCHEMA EXTRACTION 75
4.1 Basic Extraction Rules 76
4.2 Schema Extraction Algorithm 78
4.3 Example 82
4.4 Discussion 90
4.5 Summary 91
5 NORMALIZATION 93
5.1 Motivating Example 94
5.2 Background 98
5.3 A Normal Form For Semistructured Schemas 101
5.4 Converting Schemas into the Normal Form 105
5.5 Discussion 123
6 VIEWS 127
6.1 Motivating Example 128
6.2 The Select Operator 132
6.3 The Drop Operator 133
6.4 The Join Operator 137
6.5 The Swap Operator 141
6.6 Design Rules for IDentifier Dependency Relationship 148
6.7 Example of Designing View 150
6.8 Related Work 152
6.9 Summary 154
7 PHYSICAL DATABASE DESIGN 155
7.1 Relational Database Physical Design 155
7.2 IMS Database Physical Design 157
7.3 Redundancy in ORA-SS Schema Diagram 159
7.4 Replicated NF in ORA-SS 162
7.5 Controlled Pairing in ORA-SS Schema Diagrams 166
7.6 Measure of Data Replication 169
7.7 Guidelines for Physical Semistructured Database Design 170
7.8 Storage of Documents in an Object Relational Database 174
7.9 Summary 176
8 CONCLUSION 177
Appendix 181
References 185
Index 189
About the Authors 191
Chapter 1 INTRODUCTION (p. 1-2)
Today, many computer systems produce and consume large amounts of data. Consider a library catalogue system that stores the details of the holdings in a library and allows users to query information and perhaps even request books, or an accounting system that reads data from files, transforms it and prints reports. In the past much of the data has been stored in relational database systems and the designers of the computer systems have paid special attention to the organization or structure of this data. We have since moved to the age of the World Wide Web (or web) where many new technologies and applications have emerged.
Many of the applications built today are web based, and the corresponding technologies that are used have been specifically designed for the web. Let us consider how data was stored before the advent of the web. Data was stored in files or in databases. For the former, the entire file is read from and written to disk when data is needed. This works well for applications that do not use large amounts of data, that is, applications that can read the entire file into memory, manipulate the data and write the file back out to disk. However, this approach is inadequate for systems that require more data than can fit in main memory. For these kinds of applications, a database is required.
The use of databases leads to new problems including how to maintain the consistency of the data with respect to real world constraints. For example, suppose we have a database that stores details of students. Is it possible to ensure that a student’s address appears in the database only once. If the address appears multiple times, then how can we guarantee the consistency of the repeated data? It is necessary to model the constraints in the database if we want the database system to enforce these constraints. Some constraints can be enforced by the organization or structure of the data while others must be programmed as general constraints.
Yet another problem that arises from the use of database systems is how should the constraints from the real world be captured during the design process. Typically they are recorded in a conceptual model such as an Entity- Relationship diagram. Such constraints contain semantic information, that is, they provide some meaning to the underlying data. It is important that these constraints are enforced by the database. When data is manipulated, the database system checks that none of the constraints are violated. In other words, the semantics from the real world still hold in the result of the manipulation.
Traditional relational databases which assume that data is structured are no longer suitable for the new Web applications because the data on which the Web applications are based lacks structure and may be incomplete. Thus, many of the techniques that were previously used may not be applicable. This less structured data, also known as semistructured data, is usually represented as a tree of elements, where the children are sub-elements of their parent element. Elements can in turn have attributes. Queries over the trees are represented as path expressions.
The eXtensible Markup Language (XML) [Bray et al., 2000] is a language that is used to express semistructured data. XML is self-describing since each element has a tag which gives a name for the content. However, recently, various schema languages have been defined to specify the structure of the underlying XML data and constraints that are expected to hold in instances of the XML data. The schemas are descriptive rather than prescriptive. Like traditional data, XML data may be stored in files or in a database. The database can have an underlying relational engine or it can be specifically designed for XML data. The former are called XML-enabled databases and the latter are called native XML databases. Like the entity relationship diagram for relational databases, a diagrammatic representation that reflects real world constraints could be used for requirements gathering, and for the design of schemas for semistructured documents.
Erscheint lt. Verlag | 30.3.2006 |
---|---|
Reihe/Serie | Web Information Systems Engineering and Internet Technologies Book Series | Web Information Systems Engineering and Internet Technologies Book Series |
Zusatzinfo | XVI, 178 p. |
Verlagsort | New York |
Sprache | englisch |
Themenwelt | Informatik ► Datenbanken ► Data Warehouse / Data Mining |
Informatik ► Theorie / Studium ► Algorithmen | |
Naturwissenschaften | |
Schlagworte | Database • Database Design • Notation |
ISBN-10 | 0-387-23568-X / 038723568X |
ISBN-13 | 978-0-387-23568-4 / 9780387235684 |
Haben Sie eine Frage zum Produkt? |
Größe: 8,8 MB
DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.
Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich