NoSQL for Mere Mortals
Addison-Wesley Educational Publishers Inc (Verlag)
978-0-13-402321-2 (ISBN)
- Titel ist leider vergriffen;
keine Neuauflage - Artikel merken
The Mere Mortals® tutorials have earned worldwide praise as the clearest, simplest way to master essential database technologies. Now, there’s one for today’s exciting new NoSQL databases. NoSQL for Mere Mortals guides you through solving real problems with NoSQL and achieving unprecedented scalability, cost efficiency, flexibility, and availability.
Drawing on 20+ years of cutting-edge database experience, Dan Sullivan explains the advantages, use cases, and terminology associated with all four main categories of NoSQL databases: key-value, document, column family, and graph databases. For each, he introduces pragmatic best practices for building high-value applications. Through step-by-step examples, you’ll discover how to choose the right database for each task, and use it the right way.
Coverage includes
--Getting started: What NoSQL databases are, how they differ from relational databases, when to use them, and when not to Data management principles and design criteria: Essential knowledge for creating any database solution, NoSQL or relational
--Key-value databases: Gaining more utility from data structures
--Document databases: Schemaless databases, normalization and denormalization, mutable documents, indexing, and design patterns
--Column family databases: Google’s BigTable design, table design, indexing, partitioning, and Big Data
Graph databases: Graph/network modeling, design tips, query methods, and traps to avoid
Whether you’re a database developer, data modeler, database user, or student, learning NoSQL can open up immense new opportunities. As thousands of database professionals already know, For Mere Mortals is the fastest, easiest route to mastery.
Dan Sullivan is a data architect and data scientist with more than 20 years of experience in business intelligence, machine learning, data mining, text mining, Big Data, data modeling, and application design. Dan’s project work has ranged from analyzing complex genomics and proteomics data to designing and implementing numerous database applications. His most recent work has focused on NoSQL database modeling, data analysis, cloud computing, text mining, and data integration in life sciences. Dan has extensive experience in relational database design and works regularly with NoSQL databases. Dan has presented and written extensively on NoSQL, cloud computing, analytics, data warehousing, and business intelligence. He has worked in many industries, including life sciences, financial services, oil and gas, manufacturing, health care, insurance, retail, power systems, telecommunications, pharmaceuticals, and publishing.
Preface xxi
Introduction xxv
PART I: INTRODUCTION 1
Chapter 1 Different Databases for Different Requirements 3
Relational Database Design 4
E-commerce Application 5
Early Database Management Systems 6
Flat File Data Management Systems 7
Organization of Flat File Data Management Systems 7
Random Access of Data 9
Limitations of Flat File Data Management Systems 9
Hierarchical Data Model Systems 12
Organization of Hierarchical Data Management Systems 12
Limitations of Hierarchical Data Management Systems 14
Network Data Management Systems 14
Organization of Network Data Management Systems 15
Limitations of Network Data Management Systems 17
Summary of Early Database Management Systems 17
The Relational Database Revolution 19
Relational Database Management Systems 19
Organization of Relational Database Management Systems 20
Organization of Applications Using Relational Database Management Systems 26
Limitations of Relational Databases 27
Motivations for Not Just/No SQL (NoSQL) Databases 29
Scalability 29
Cost 31
Flexibility 31
Availability 32
Summary 34
Case Study 35
Review Questions 36
References 37
Bibliography 37
Chapter 2 Variety of NoSQL Databases 39
Data Management with Distributed Databases 41
Store Data Persistently 41
Maintain Data Consistency 42
Ensure Data Availability 44
Consistency of Database Transactions 47
Availability and Consistency in Distributed Databases 48
Balancing Response Times, Consistency, and Durability 49
Consistency, Availability, and Partitioning: The CAP Theorem 51
ACID and BASE 54
ACID: Atomicity, Consistency, Isolation, and Durability 54
BASE: Basically Available, Soft State, Eventually Consistent 56
Types of Eventual Consistency 57
Casual Consistency 57
Read-Your-Writes Consistency 57
Session Consistency 58
Monotonic Read Consistency 58
Monotonic Write Consistency 58
Four Types of NoSQL Databases 59
Key-Value Pair Databases 60
Keys 60
Values 64
Differences Between Key-Value and Relational Databases 65
Document Databases 66
Documents 66
Querying Documents 67
Differences Between Document and Relational Databases 68
Column Family Databases 69
Columns and Column Families 69
Differences Between Column Family and Relational Databases 70
Graph Databases 71
Nodes and Relationships 72
Differences Between Graph and Relational Databases 73
Summary 75
Review Questions 76
References 77
Bibliography 77
PART II: KEY-VALUE DATABASES 79
Chapter 3 Introduction to Key-Value Databases 81
From Arrays to Key-Value Databases 82
Arrays: Key Value Stores with Training Wheels 82
Associative Arrays: Taking Off the Training Wheels 84
Caches: Adding Gears to the Bike 85
In-Memory and On-Disk Key-Value Database: From Bikes to Motorized Vehicles 89
Essential Features of Key-Value Databases 91
Simplicity: Who Needs Complicated Data Models Anyway? 91
Speed: There Is No Such Thing as Too Fast 93
Scalability: Keeping Up with the Rush 95
Scaling with Master-Slave Replication 95
Scaling with Masterless Replication 98
Keys: More Than Meaningless Identifiers 103
How to Construct a Key 103
Using Keys to Locate Values 105
Hash Functions: From Keys to Locations 106
Keys Help Avoid Write Problems 107
Values: Storing Just About Any Data You Want 110
Values Do Not Require Strong Typing 110
Limitations on Searching for Values 112
Summary 114
Review Questions 115
References 116
Bibliography 116
Chapter 4 Key-Value Database Terminology 117
Key-Value Database Data Modeling Terms 118
Key 121
Value 123
Namespace 124
Partition 126
Partition Key 129
Schemaless 129
Key-Value Architecture Terms 131
Cluster 131
Ring 133
Replication 135
Key-Value Implementation Terms 137
Hash Function 137
Collision 138
Compression 139
Summary 141
Review Questions 141
References 142
Chapter 5 Designing for Key-Value Databases 143
Key Design and Partitioning 144
Keys Should Follow a Naming Convention 145
Well-Designed Keys Save Code 145
Dealing with Ranges of Values 147
Keys Must Take into Account Implementation Limitations 149
How Keys Are Used in Partitioning 150
Designing Structured Values 151
Structured Data Types Help Reduce Latency 152
Large Values Can Lead to Inefficient Read and Write Operations 155
Limitations of Key-Value Databases 159
Look Up Values by Key Only 160
Key-Value Databases Do Not Support Range Queries 161
No Standard Query Language Comparable to SQL for Relational Databases 161
Design Patterns for Key-Value Databases 162
Time to Live (TTL) Keys 163
Emulating Tables 165
Aggregates 166
Atomic Aggregates 169
Enumerable Keys 170
Indexes 171
Summary 173
Case Study: Key-Value Databases for Mobile Application Configuration 174
Review Questions 177
References 178
PART III: DOCUMENT DATABASES 179
Chapter 6 Introduction to Document Databases 181
What Is a Document? 182
Documents Are Not So Simple After All 182
Documents and Key-Value Pairs 187
Managing Multiple Documents in Collections 188
Getting Started with Collections 188
Tips on Designing Collections 191
Avoid Explicit Schema Definitions 199
Basic Operations on Document Databases 201
Inserting Documents into a Collection 202
Deleting Documents from a Collection 204
Updating Documents in a Collection 206
Retrieving Documents from a Collection 208
Summary 210
Review Questions 210
References 211
Chapter 7 Document Database Terminology 213
Document and Collection Terms 214
Document 215
Documents: Ordered Sets of Key-Value Pairs 215
Key and Value Data Types 216
Collection 217
Embedded Document 218
Schemaless 220
Schemaless Means More Flexibility 221
Schemaless Means More Responsibility 222
Polymorphic Schema 223
Types of Partitions 224
Vertical Partitioning 225
Horizontal Partitioning or Sharding 227
Separating Data with Shard Keys 229
Distributing Data with a Partitioning Algorithm 230
Data Modeling and Query Processing 232
Normalization 233
Denormalization 235
Query Processor 235
Summary 237
Review Questions 237
References 238
Chapter 8 Designing for Document Databases 239
Normalization, Denormalization, and the Search for Proper Balance 241
One-to-Many Relations 242
Many-to-Many Relations 243
The Need for Joins 243
Executing Joins: The Heavy Lifting of Relational Databases 245
Executing Joins Example 247
What Would a Document Database Modeler Do? 248
The Joy of Denormalization 249
Avoid Overusing Denormalization 251
Just Say No to Joins, Sometimes 253
Planning for Mutable Documents 255
Avoid Moving Oversized Documents 258
The Goldilocks Zone of Indexes 258
Read-Heavy Applications 259
Write-Heavy Applications 260
Modeling Common Relations 261
One-to-Many Relations in Document Databases 262
Many-to-Many Relations in Document Databases 263
Modeling Hierarchies in Document Databases 265
Parent or Child References 265
Listing All Ancestors 266
Summary 267
Case Study: Customer Manifests 269
Embed or Not Embed? 271
Choosing Indexes 271
Separate Collections by Type? 272
Review Questions 273
References 273
PART IV: COLUMN FAMILY DATABASES 275
Chapter 9 Introduction to Column Family Databases 277
In the Beginning, There Was Google BigTable 279
Utilizing Dynamic Control over Columns 280
Indexing by Row, Column Name, and Time Stamp 281
Controlling Location of Data 282
Reading and Writing Atomic Rows 283
Maintaining Rows in Sorted Order 284
Differences and Similarities to Key-Value and Document Databases 286
Column Family Database Features 286
Column Family Database Similarities to and Differences from Document Databases 287
Column Family Database Versus Relational Databases 289
Avoiding Multirow Transactions 290
Avoiding Subqueries 291
Architectures Used in Column Family Databases 293
HBase Architecture: Variety of Nodes 293
Cassandra Architecture: Peer-to-Peer 295
Getting the Word Around: Gossip Protocol 296
Thermodynamics and Distributed Database: Why We Need Anti-Entropy 299
Hold This for Me: Hinted Handoff 300
When to Use Column Family Databases 303
Summary 304
Review Questions 304
References 305
Chapter 10 Column Family Database Terminology 307
Basic Components of Column Family Databases 308
Keyspace 309
Row Key 309
Column 310
Column Families 312
Structures and Processes: Implementing Column Family Databases 313
Internal Structures and Configuration Parameters of Column Family Databases 313
Old Friends: Clusters and Partitions 314
Cluster 314
Partition 316
Taking a Look Under the Hood: More Column Family Database Components 317
Commit Log 317
Bloom Filter 319
Consistency Level 321
Processes and Protocols 322
Replication 322
Anti-Entropy 323
Gossip Protocol 324
Hinted Handoff 325
Summary 326
Review Questions 327
References 327
Chapter 11 Designing for Column Family Databases 329
Guidelines for Designing Tables 332
Denormalize Instead of Join 333
Make Use of Valueless Columns 334
Use Both Column Names and Column Values to Store Data 334
Model an Entity with a Single Row 335
Avoid Hotspotting in Row Keys 337
Keep an Appropriate Number of Column Value Versions 338
Avoid Complex Data Structures in Column Values 339
Guidelines for Indexing 340
When to Use Secondary Indexes Managed by the Column Family Database System 341
When to Create and Manage Secondary Indexes Using Tables 345
Tools for Working with Big Data 348
Extracting, Transforming, and Loading Big Data 350
Analyzing Big Data 351
Describing and Predicting with Statistics 351
Finding Patterns with Machine Learning 353
Tools for Analyzing Big Data 354
Tools for Monitoring Big Data 355
Summary 356
Case Study: Customer Data Analysis 357
Understanding User Needs 357
Review Questions 359
References 360
PART V: GRAPH DATABASES 361
Chapter 12 Introduction to Graph Databases 363
What Is a Graph? 363
Graphs and Network Modeling 365
Modeling Geographic Locations 365
Modeling Infectious Diseases 366
Modeling Abstract and Concrete Entities 369
Modeling Social Media 370
Advantages of Graph Databases 372
Query Faster by Avoiding Joins 372
Simplified Modeling 375
Multiple Relations Between Entities 375
Summary 376
Review Questions 376
References 377
Chapter 13 Graph Database Terminology 379
Elements of Graphs 380
Vertex 380
Edge 381
Path 383
Loop 384
Operations on Graphs 385
Union of Graphs 385
Intersection of Graphs 386
Graph Traversal 387
Properties of Graphs and Nodes 388
Isomorphism 388
Order and Size 389
Degree 390
Closeness 390
Betweenness 391
Types of Graphs 392
Undirected and Directed Graphs 392
Flow Network 393
Bipartite Graph 394
Multigraph 395
Weighted Graph 395
Summary 396
Review Questions 397
References 397
Chapter 14 Designing for Graph Databases 399
Getting Started with Graph Design 400
Designing a Social Network Graph Database 401
Queries Drive Design (Again) 405
Querying a Graph 408
Cypher: Declarative Querying 408
Gremlin: Query by Graph Traversal 410
Basic Graph Traversal 410
Traversing a Graph with Depth-First and Breadth-First Searches 412
Tips and Traps of Graph Database Design 415
Use Indexes to Improve Retrieval Time 415
Use Appropriate Types of Edges 416
Watch for Cycles When Traversing Graphs 417
Consider the Scalability of Your Graph Database 418
Summary 420
Case Study: Optimizing Transportation Routes 420
Understanding User Needs 420
Designing a Graph Analysis Solution 421
Review Questions 423
References 423
PART VI: CHOOSING A DATABASE FOR YOUR APPLICATION 425
Chapter 15 Guidelines for Selecting a Database 427
Choosing a NoSQL Database 428
Criteria for Selecting Key-Value Databases 429
Use Cases and Criteria for Selecting Document Databases 430
Use Cases and Criteria for Selecting Column Family Databases 431
Use Cases and Criteria for Selecting Graph Databases 433
Using NoSQL and Relational Databases Together 434
Summary 436
Review Questions 436
References 437
PART VII: APPENDICES 441
Appendix A Answers to Chapter Review Questions 443
Appendix B List of NoSQL Databases 477
Glossary 481
9780134023212 TOC 3/27/2015
Verlagsort | New Jersey |
---|---|
Sprache | englisch |
Maße | 179 x 232 mm |
Gewicht | 890 g |
Themenwelt | Informatik ► Datenbanken ► Data Warehouse / Data Mining |
Mathematik / Informatik ► Informatik ► Grafik / Design | |
Mathematik / Informatik ► Informatik ► Software Entwicklung | |
ISBN-10 | 0-13-402321-8 / 0134023218 |
ISBN-13 | 978-0-13-402321-2 / 9780134023212 |
Zustand | Neuware |
Haben Sie eine Frage zum Produkt? |
aus dem Bereich