Usage-Driven Database Design (eBook)
XXIV, 374 Seiten
Apress (Verlag)
978-1-4842-2722-0 (ISBN)
Key to the framework is a method for taking the logical data model that is a static look at the definition of the data, and merging that static look with the process models describing how the data will be used in actual practice once a given system is implemented. The approach solves the disconnect between the static definition of data in the logical data model and the dynamic flow of the data in the logical process models.
The design framework in this book can be used to create operational databases for transaction processing systems, or for data warehouses in support of decision support systems. The information manager can be a flat file, Oracle Database, IMS, NoSQL, Cassandra, Hadoop, or any other DBMS.
Usage-Driven Database Design emphasizes practical aspects of design, and speaks to what works, what doesn't work, and what to avoid at all costs. Included in the book are lessons learned by the author over his 30+ years in the corporate trenches. Everything in the book is grounded on good theory, yet demonstrates a professional and pragmatic approach to design that can come only from decades of experience.
- Presents an end-to-end framework from logical data modeling through physical schema definition.
- Includes lessons learned, techniques, and tricks that can turn a database disaster into a success.
- Applies to all types of database management systems, including NoSQL such as Cassandra and Hadoop, and mainstream SQL databases such as Oracle and SQL Server
What You'll Learn
- Create logical data models that accurately reflect the real world of the user
- Create usage scenarios reflecting how applications will use a new database
- Merge static data models with dynamic process models to create resilient yet flexible database designs
- Support application requirements by creating responsive database schemas in any database architecture
- Cope with big data and unstructured data for transaction processing and decision support systems
- Recognize when relational approaches won't work, and when to turn toward NoSQL solutions such as Cassandra or Hadoop
System developers, including business analysts, database designers, database administrators, and application designers and developers who must design or interact with database systems
George Tillmann is a retired Booz, Allen Hamilton partner; a former programmer, analyst, management consultant; and CIO who managed Booz Allen's global IT organization. He brings more than 30 years experience as a database administrator, database consultant, and database product designer. He has written two books, was a Computerworld columnist, and has articles published in CIO, Infoworld, Techworld, Data Base, The Standard, Database Programming & Design and is a former member of the ANSI/X3/SPARC Data Base Management Systems Study Group.
Contents at a Glance 5
Contents 7
About the Author 18
Preface 19
Part I: Introduction 23
Chapter 1: Introduction to Usage-Driven Database Design 24
Database Design Principle 1: Separation Principle 26
Database Design Principle 2: Distinction Principle 27
The Difference Between Separation and Distinction 29
Database Design Principle 3: Convergence Principle 29
The Separation, Distinction, and Convergence Principles 30
Database Design Principle 4: Minimal Regression Principle 30
Usage-Driven Database Design 30
Logical Data Modeling 31
Physical Schema Definition 32
The Terminology Trap 32
Notes 33
Part II: Logical Data Modeling 34
Chapter 2: The E-R Approach 35
A Little Data Modeling History 37
Some Important Definitions 38
Logical Data Modeling Objects 39
Entities 39
Type-Instance Distinction 40
Relationships 40
Attributes 41
Notes 42
Chapter 3: More About the E-R Approach 43
More About Relationships 43
Membership Class 43
Cardinality 44
Modality 45
Degree 47
Binary Relationship 47
N-ary Relationships 48
Unary or Recursive Relationships 48
Relationship Constraints 49
Inclusion 49
Exclusion 50
Conjunction 50
Simple Conjunction 50
Conditional Conjunction 51
Recursive Modality Constraints 52
More About Entities 55
Attributive Entity 55
Associative Entities 56
Supertype and Subtype Entities (Generalization and Specialization) 56
More About Attributes 58
Attribute Domain 58
Attribute Source: Primitive and Derived 59
Attribute Descriptor and Unique Identifier 59
Compound or Concatenated Unique Identifiers 60
Attribute Complexity: Simple and Group 60
Attribute Valuation: Single Value and Multivalue 61
Attribute Complexity and Valuation 61
Chapter 4: Building the Logical Data Model 64
The Interview Process 65
Gather Information and Review 66
1. Identify the Users Who Are Authorities or Experts on the Subject 66
2. Meet and Interview the Experts and Identify the Subject (Application) Entities 66
Preparation 66
The First Interview 67
3. Identify Relationships Between the Entities 67
4. Identify the Properties or Attributes of the Entities and Relationships 67
Analyze Information 67
Construct Model 68
Repeat as Necessary 68
Making Sense of the Interview 68
Modeling Rules 70
Verifying What You Have Heard 72
Immediate Interview Feedback 72
Formal Walk-Throughs 72
Increasing E-R Diagram Comprehension 74
Subject Areas 74
Entity Fragments 75
Neighborhood Diagrams 76
Relationship Bridges and Stubs 77
Some Model Building Best Practices 78
Getting Started 78
Don’t Lose Control of the Project to Users 79
Don’t Lose Control of the Project to Technical Staff 79
Don’t Become Dependent on Tools or Techniques 80
Don’t Get Bogged Down in Endless Analysis 80
The Players…and the Rules of Engagement 81
Deliverables 82
Examples of Deliverables 83
Sample Data Dictionary, Data Object Definitions 84
Notes 86
Chapter 5: LDM Best Practices 87
Abbreviations 88
Almost Unique Identifiers 89
Clarity 90
Compound Unique Identifiers 90
Conceptual Integrity 91
Conjunctive Relationships 93
Duplicate Super-Subtypes “Type” Data 93
Exclusive and Nonexclusive Generalization 94
Required and Nonrequired Participation 96
Exclusive Relationships 96
Group Attributes 97
Level of Abstraction 97
Many-to-Many Relationships 98
N-ary Relationships 100
N-ary Relationships and Membership Class 101
Naming Objects 104
Multiple Names 104
Naming Conventions 105
Name Uniqueness 105
Naming Convention Goals 106
Null Attributes 109
There Be Blanks in Them Thar Nulls 109
Optional Relationships (Optional-Optional Relationships) 110
Subject Areas 110
Supertypes and Subtypes 111
Unique Identifiers 113
Note 115
Chapter 6: LDM Pitfalls 116
Circular Relationships 116
Data Values 117
Data Value–Differentiated Entities and Attributes 118
Derived Data 119
Three Poor Arguments Against Modeling Derived Data 119
Derived Data as Process 120
Derived Data and Physical Database Design 121
Discrete Attributes 122
Embedded Attributes 123
Uniqueness 123
Group Attributes 123
The Problem with Embedded Attributes 123
The Solution 124
The Moral of the Story 124
Entity Fragmentation 124
Foreign Keys 125
Junction Entities 126
Normalization 126
Presentation Data 127
Primary Keys 127
Process Data 128
Repeating Groups 129
Multivalue Attribute 129
Group Attribute 130
Single-Attribute Entities 130
Code 131
Multivalue Attribute (Repeating Group) 131
Associative Entities 131
Substitution Data 132
Substitution Tables 132
Transient Data 132
Location-Dependent Data 133
Chapter 7: LDM Perils to Watch For 135
Associatives Related to Other Associatives 135
Diagrammable Objects 136
Disassociated Entity Clusters (“Islands”) 137
Duplicate Unique Identifiers 138
One Entity, Two or More Identifiers 138
One Identifier, Two or More Entities 138
Multiple Relationships 139
One Relationship, Multiple Views 139
Multiple Different but Similar Relationships 139
One-of-a-Kind (OOAK) Entities 140
One-to-One Relationships 140
Rare Entity Relationships 141
Recursive Modality Constraints 142
Updating the Constraints 143
Spiderwebs 144
Too Many Blanks or Nulls 145
Too Many Recursives 146
Next U3D Phase: Physical Schema Definition 147
Notes 147
Part III: Physical Schema Definition 148
Chapter 8: Introduction to Physical Database Design 149
A Short Incondite History of Automated Information Management (or, a Sequential Look at Random Access) 150
Information Management Era 1: Sequential Processing 150
Information Management Era 2: The First Random Access DBMS 151
A Small Digression: A Couple of Words About Database Access 156
Hashing 156
Inverted Indices 158
Database Pages 159
B-Trees 160
Bitmaps 162
Associative Arrays 163
Information Management Era 3: Inverted File Systems 164
Information Management Era 4: The Age of Relational 164
Problems with Relational 166
First—Performance Issues 166
Second—Not So Simple Simplicity 166
Data Types 166
Procedural Code 166
Groups 167
Third—Communication and Language 167
Fourth—Relational: Theory or DBMS? 169
Fifth—Where Are You Relational Model? 170
Just Because It Has Failings Doesn’t Mean It’s a Failure 170
Information Management Era 5: Object Technology 171
Object-Oriented Programming Led to Object-Oriented Analysis and Design, Which Eventually Led to the Object-Oriented Database Management Systems (OODBMSs) 171
A Small Digression (Again): The ACID Test 172
Information Management Era 6: NoSQL 173
Key-Value 173
Graph 174
Document Management 174
Multimodal 174
Is That an ACID or a BASE? 174
And the Winner Is… 176
What’s to Come 177
References 177
Notes 177
Chapter 9: Introduction to Physical Schema Definition 179
Usage-Driven Database Design: Physical Schema Definition 181
Step 1: Transformation 182
Task 1.1: Translation 183
Task 1.2: Expansion 184
Step 2: Utilization 185
Task 2.1: Usage Analysis 185
Task 2.2: Path Rationalization 187
Step 3: Formalization 188
Task 3.1: Environment Designation 189
Task 3.2: Constraint Compliance 191
Step 4: Customization 193
Task 4.1: Resource Analysis 194
Task 4.2: Performance Enhancement 196
Summary 197
Chapter 10: Transformation: Creating the Physical Data Model 199
Task 1.1: Translation 200
Activity 1.1.1: Transform LDM Objects to PDM Objects 200
Entities to Record Types 201
Relationships to Linkages 204
Linkage Membership Class 204
Linkage Degree 205
Linkage Constraints 205
Attributes to Data Items 206
Data Item Domain 207
Data Item Source: Primitive and Derived 207
Primitive Data Item: Unique Identifiers and Descriptors 207
Data Item Complexity: Simple and Group 207
Data Item Valuation: Single Value and Multivalue 208
Other Data Item Information 208
Activity 1.1.2: Diagram the Objects 208
Task 1.2: Expansion 209
Activity 1.2.1: Assign Keys 209
Activity 1.2.2: Normalize the Model 211
Adjustments Needed for Normalization: Keys—Foreign and Domestic? 213
Zero Normal Form 214
First Normal Form 214
Before Getting to Second Normal Form, a Slight Digression 215
Second Normal Form 215
Third Normal Form 216
Post-Normalization—Retreat of Sally Forth? 217
Issues with Normalization 217
Tranformation Notes 219
Deliverables 220
Examples of Deliverables 220
Chapter 11: Utilization: Merging Data and Process 224
Task 2.1: Usage Analysis 225
Process Modeling 225
Logical Process Modeling 225
Natural-Language Logical Process Modeling Techniques 226
Plain English 226
Oy Vey, There Has Got to Be a Better English Translation 226
Structured English 226
Graphical Logical Process Modeling Techniques 228
Physical Process Modeling 230
Natural-Language Physical Process Modeling Techniques 230
Plain English 230
Structured English 230
Pseudocode 230
Graphic Physical Process Modeling Techniques 231
Flow Charts 231
Structure Charts 232
Activity 2.1.1: Create Usage Scenarios 233
Clearing the Decks for Action 234
Putting a Usage Scenario Together 237
An Example 237
Activity 2.1.2: Map Usage Scenarios to the PDM 239
Task 2.2: Path Rationalization 240
Activity 2.2.1: Reduce to Simplest Paths 240
Activity 2.2.2: Simplify Model 241
Utilization Notes 242
Deliverables 242
Example of Deliverables 243
Further Reading 243
Structured English 243
Data Flow Diagramming 244
Flow Charts 244
Pseudocode 244
Structure Charts 244
Chapter 12: Formalization: Creating a Schema 245
Task 3.1: Environment Designation 246
Hierarchical Systems 249
Network Systems 249
Relational Systems 249
Object-Oriented 250
NoSQL 250
DBMS Product and Version Selection 250
Task 3.2: Constraint Compliance 251
Pseudocode…Again 251
Activity 3.2.1: Map Rationalized Physical Data Model to the Data Architecture 253
Record Types 253
Proper 253
Associative 253
Attributive 254
S-Type 255
Links 255
Membership Class: Cardinality 255
One-to-One 255
One-to-Many 256
Many-to-Many 256
Membership Class: Modality 256
Mandatory 257
Optional 257
Degree 257
Unary 257
Binary 258
N-ary 258
Constraints 259
Inclusion 259
Exclusion 259
Conjunction 259
Data Items 260
Domains 260
Source: Primitive and Derived 260
Primitive Data Items 260
Derived Data Items 261
Complexity: Simple and Group 261
Valuation: Single Value and Multivalue 262
Activity 3.2.2: Create a DBMS Product/Version-Specific Functional Physical Database Design 263
Subschema Creation 265
Formalization Notes 267
Deliverables 267
Example of Deliverables 268
Chapter 13: Customization: Enhancing Performance 269
Task 4.1: Resource Analysis 270
The Trade-Off Triangle 271
Task 4.2: Performance Enhancements 274
Activity 4.2.1: Customize Hardware 275
A Few Words About Secondary Storage 275
Add Disk 277
Faster Disk 277
Main Memory 277
Activity 4.2.2: Customize Software 278
Indices (B-Tree, Hash, Bitmap) 278
Clustering 278
Example Using Indices and Clusters 281
Question 1: Should Order Be Indexed? 283
Question 2: Should Product Be Indexed? 283
Question 3: Should Line Item Be Indexed? 284
Question 4: Should Line Item Be Clustered? 284
Question 5: Should Line Item Be Clustered Around Order or Product? 284
Alternative 1: Line Item Clustered Around Order 284
Alternative Two: Line Item Clustered Around Product 285
Partitioning 286
Derived and Duplicate Data 287
Denormalization 288
Get Rid of ACID 288
Big Data, Big Problems, Big Solution 289
To Plunge or Not to Plunge 290
NoSQL 291
Modeling Big Data U3D Style 292
Customization Notes 294
Deliverables 294
Examples of Deliverables 295
Chapter 14: The Data Warehouse 298
The Data Warehouse 299
Data Warehouse Architecture 301
Using U3D to Develop a Data Warehouse 302
Step 1: Transformation 302
Step 2: Utilization 303
The Time Dimension 305
Step 3: Formalization 306
Step 4: Customization 306
Streamlining 306
Duplication 306
Denormalization 306
Indices and Hashing 307
Bitmaps 307
Bulk Loading 308
Clustering 309
Partitioning 309
Distributed Processing 309
All Together Now… 310
Oops… 310
Customization Notes 311
Note 311
Chapter 15: The Big Data Decision Support System 312
Structured, Unstructured, and Semistructured Data—Another Small Digression 313
DSS and Big Data 316
Using U3D to Develop a Big Data Decision Support System 317
Step 1: Transformation 318
Step 2: Utililization 318
Step 3: Formalization 319
Step 4: Customization 321
A Little About Hadoop 321
Putting It All Together 324
Deliverables 324
Part IV: Where from Here? 326
Chapter 16: A Look Ahead 327
We Need to Ask the Awkward Questions 328
Tools Need to Take Usage into Account 330
The One and Only DBMS 330
Better Training 334
Notes 336
Part V: Appendixes 337
Appendix A: Glossary 338
Appendix B: Logical Data Modeling Definitions 357
Entity 357
Relationship 358
Attribute 358
Domain 359
Appendix C: Physical Schema Definition Object Definitions 360
Record 360
Linkage 361
Data Item 362
Domain 362
Cluster 363
Partition 363
Index 364
Appendix D: Formulas Used in This Book 365
Appendix E: List of U3D Deliverables 367
Index 370
Erscheint lt. Verlag | 7.4.2017 |
---|---|
Zusatzinfo | XXIV, 374 p. 143 illus., 16 illus. in color. |
Verlagsort | Berkeley |
Sprache | englisch |
Themenwelt | Mathematik / Informatik ► Informatik ► Datenbanken |
Schlagworte | Database Design • Data Modeling • Data Warehouse • Entity-relationship • logical data model • Logical Design • normalization • Physical Database Design • physical design • Process model • tradeoff triangle • usage maps |
ISBN-10 | 1-4842-2722-0 / 1484227220 |
ISBN-13 | 978-1-4842-2722-0 / 9781484227220 |
Informationen gemäß Produktsicherheitsverordnung (GPSR) | |
Haben Sie eine Frage zum Produkt? |
Größe: 7,9 MB
DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.
Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich