Joe Celko's SQL for Smarties - Joe Celko

Joe Celko's SQL for Smarties (eBook)

Advanced SQL Programming

Joe Celko (Autor)

eBook Download: EPUB

2010 | 3. Auflage
840 Seiten
Elsevier Science (Verlag)
978-0-08-046004-8 (ISBN)

SQL for Smarties was hailed as the first book devoted explicitly to the advanced techniques needed to transform an experienced SQL programmer into an expert. Now, 10 years later and in the third edition, this classic still reigns supreme as the book written by an SQL master that teaches future SQL masters. These are not just tips and techniques, Joe also offers the best solutions to old and new challenges and conveys the way you need to think in order to get the most out of SQL programming efforts for both correctness and performance.

In the third edition, Joe features new examples and updates to SQL-99, expanded sections of Query techniques, and a new section on schema design, with the same war-story teaching style that made the first and second editions of this book classics.

* Expert advice from a noted SQL authority and award-winning columnist, who has given ten years of service to the ANSI SQL standards committee and many more years of dependable help to readers of online forums.
* Teaches scores of advanced techniques that can be used with any product, in any SQL environment, whether it is an SQL-92 or SQL-99 environment.
* Offers tips for working around system deficiencies.
* Continues to use war stories--updated!--that give insights into real-world SQL programming challenges.

SQL for Smarties was hailed as the first book devoted explicitly to the advanced techniques needed to transform an experienced SQL programmer into an expert. Now, 10 years later and in the third edition, this classic still reigns supreme as the book written by an SQL master that teaches future SQL masters. These are not just tips and techniques; Joe also offers the best solutions to old and new challenges and conveys the way you need to think in order to get the most out of SQL programming efforts for both correctness and performance. In the third edition, Joe features new examples and updates to SQL-99, expanded sections of Query techniques, and a new section on schema design, with the same war-story teaching style that made the first and second editions of this book classics. - Expert advice from a noted SQL authority and award-winning columnist, who has given ten years of service to the ANSI SQL standards committee and many more years of dependable help to readers of online forums. - Teaches scores of advanced techniques that can be used with any product, in any SQL environment, whether it is an SQL-92 or SQL-99 environment. - Offers tips for working around system deficiencies. - Continues to use war stories--updated!--that give insights into real-world SQL programming challenges.

front cover 1
copyright 7
table of contents 10
front matter 26
Introduction to the Third Edition 26
1.1 What Changed in Ten Years 26
1.2 What Is New in This Edition 28
1.3 Corrections and Additions 29
body 30
1 Database Design 30
1.1 Schema and Table Creation 31
1.1.1 CREATE SCHEMA Statement 32
1.1.2 Manipulating Tables 34
1.1.3 Column Constraints 39
1.1.4 UNIQUE Constraints versus UNIQUE Indexes 46
1.1.5 Nested UNIQUE Constraints 47
1.1.6 Overlapping Keys 51
1.1.7 CREATE ASSERTION Constraints 54
1.1.8 Using VIEWs for Schema Level Constraints 54
1.1.9 Using PRIMARY KEYs and ASSERTIONs for Constraints 58
1.1.10 Avoiding Attribute Splitting 60
1.1.11 Modeling Class Hierarchies in DDL 63
1.2 Generating Unique Sequential Numbers for Keys 65
1.2.1 IDENTITY Columns 66
1.2.2 ROWID and Physical Disk Addresses 68
1.2.3 Sequential Numbering in Pure SQL 68
1.2.4 GUIDs 70
1.2.5 Sequence Generator Functions 71
1.2.6 Unique Value Generators 71
1.2.7 Preallocated Values 73
1.2.8 Random Order Values 74
1.3 A Remark on Duplicate Rows 77
1.4 Other Schema Objects 79
1.4.1 Schema Tables 79
1.4.2 Temporary Tables 80
1.4.3 CREATE DOMAIN Statement 80
1.4.4 CREATE TRIGGER Statement 81
1.4.5 CREATE PROCEDURE Statement 82
1.4.6 DECLARE CURSOR Statement 82
2 Normalization 90
2.1 Functional and Multivalued Dependencies 93
2.2 First Normal Form (1NF) 93
2.2.1 Note on Repeated Groups 95
2.3 Second Normal Form (2NF) 99
2.4 Third Normal Form (3NF) 100
2.5 Elementary Key Normal Form (EKNF) 101
2.6 Boyce-Codd Normal Form (BCNF) 102
2.7 Fourth Normal Form (4NF) 104
2.8 Fifth Normal Form (5NF) 105
2.9 Domain-Key Normal Form (DKNF) 107
2.10 Practical Hints for Normalization 116
2.11 Key Types 117
2.11.1 Natural Keys 118
2.11.2 Artificial Keys 118
2.11.3 Exposed Physical Locators 118
2.11.4 Practical Hints for Denormalization 120
2.11.5 Row Sorting 122
3 Numeric Data in SQL 130
3.1 Numeric Types 130
3.1.1 BIT, BYTE, and BOOLEAN Data Types 133
3.2 Numeric Type Conversion 134
3.2.1 Rounding and Truncating 134
3.2.2 CAST() Function 136
3.3 Four-Function Arithmetic 137
3.4 Arithmetic and NULLs 138
3.5 Converting Values to and from NULL 139
3.5.1 NULLIF() Function 139
3.5.2 COALESCE() Function 140
3.6 Vendor Math Functions 142
3.6.1 Number Theory Operators 142
3.6.2 Exponential Functions 145
3.6.3 Scaling Functions 145
3.6.4 Converting Numbers to Words 146
4 Temporal Data Types in SQL 148
4.1 Notes on Calendar Standards 148
4.2 SQL Temporal Data Types 152
4.2.1 Tips for Handling Dates, Timestamps, and Times 153
4.2.2 Date Format Standards 153
4.2.3 Handling Timestamps 154
4.2.4 Handling Times 156
4.3 Queries with Date Arithmetic 157
4.4 The Nature of Temporal Data Models 158
4.4.1 Temporal Duplicates 158
4.4.2 Temporal Databases 164
4.4.3 Temporal Projection and Selection 166
4.4.4 Temporal Joins 168
4.4.5 Modifying Valid-Time State Tables 174
4.4.6 Current Modifications 175
4.4.7 Sequenced Modifications 179
4.4.8 Nonsequenced Modifications 184
4.4.9 Transaction-Time State Tables 185
4.4.10 Maintaining the Audit Log 187
4.4.11 Querying the Audit Log 189
4.4.12 Modifying the Audit Log 193
4.4.13 Bitemporal Tables 193
4.4.14 Temporal Support in Standard SQL 196
5 Character Data Types in SQL 198
5.1 Problems with SQL Strings 199
5.1.1 Problems of String Equality 199
5.1.2 Problems of String Ordering 200
5.1.3 Problems of String Grouping 201
5.2 Standard String Functions 201
5.3 Common Vendor Extensions 203
5.3.1 Phonetic Matching 204
5.4 Cutter Tables 211
6 NULLs: Missing Data in SQL 214
6.1 Empty and Missing Tables 216
6.2 Missing Values in Columns 216
6.3 Context and Missing Values 218
6.4 Comparing NULLs 219
6.5 NULLs and Logic 219
6.5.1 NULLS in Subquery Predicates 220
6.5.2 Standard SQL Solutions 222
6.6 Math and NULLs 222
6.7 Functions and NULLs 222
6.8 NULLs and Host Languages 223
6.9 Design Advice for NULLs 224
6.9.1 Avoiding NULLs from the Host Programs 226
6.10 A Note on Multiple NULL Values 227
7 Multiple Column Data Elements 230
7.1 Distance Functions 230
7.2 Storing an IP Address in SQL 231
7.2.1 A Single VARCHAR(15) Column 232
7.2.2 One INTEGER Column 232
7.2.3 Four SMALLINT Columns 234
7.3 Currency and Other Unit Conversions 234
7.4 Social Security Numbers 235
7.5 Rational Numbers 238
8 Table Operations 240
8.1 DELETE FROM Statement 240
8.1.1 The DELETE FROM Clause 241
8.1.2 The WHERE Clause 241
8.1.3 Deleting Based on Data in a Second Table 245
8.1.4 Deleting within the Same Table 245
8.1.5 Deleting in Multiple Tables without Referential Integrity 249
8.2 INSERT INTO Statement 250
8.2.1 INSERT INTO Clause 250
8.2.2 The Nature of Inserts 251
8.2.3 Bulk Load and Unload Utilities 252
8.3 The UPDATE Statement 252
8.3.1 The UPDATE Clause 252
8.3.2 The WHERE Clause 253
8.3.3 The SET Clause 254
8.3.4 Updating with a Second Table 255
8.3.5 Using the CASE Expression in UPDATEs 257
8.4 A Note on Flaws in a Common Vendor Extension 260
8.5 MERGE Statement 261
9 Comparison or Theta Operators 264
9.1 Converting Data Types 265
9.2 Row Comparisons in SQL 267
10 Valued Predicates 270
10.1 IS NULL Predicate 270
10.1.1 Sources of NULLs 271
10.2 IS [NOT]{TRUE | FALSE | UNKNOWN} Predicate 271
10.3 IS [NOT] NORMALIZED Predicate 273
11 CASE Expressions 276
11.1 The CASE Expression 276
11.1.1 The COALESCE() and NULLIF() Functions 280
11.1.2 CASE Expressions with GROUP BY 281
11.1.3 CASE, CHECK() Clauses and Logical Implication 282
11.1.4 Subquery Expressions and Constants 286
11.2 Rozenshtein Characteristic Functions 287
12 LIKE Predicate 290
12.1 Tricks with Patterns 291
12.2 Results with NULL Values and Empty Strings 293
12.3 LIKE Is Not Equality 293
12.4 Avoiding the LIKE Predicate with a Join 293
12.5 CASE Expressions and LIKE Predicates 295
12.6 SIMILAR TO Predicates 296
12.7 Tricks with Strings 298
12.7.1 String Character Content 298
12.7.2 Searching versus Declaring a String 299
12.7.3 Creating an Index on a String 299
13 BETWEEN and OVERLAPS Predicates 302
13.1 The BETWEEN Predicate 302
13.1.1 Results with NULL Values 303
13.1.2 Results with Empty Sets 303
13.1.3 Programming Tips 303
13.2 OVERLAPS Predicate 304
13.2.1 Time Periods and OVERLAPS Predicate 304
14 The [NOT] IN() Predicate 316
14.1 Optimizing the IN() Predicate 317
14.2 Replacing ORs with the IN() Predicate 321
14.3 NULLs and the IN() Predicate 322
14.4 IN() Predicate and Referential Constraints 324
14.5 IN() Predicate and Scalar Queries 326
15 EXISTS() Predicate 328
15.1 EXISTS and NULLs 329
15.2 EXISTS and INNER JOINs 331
15.3 NOT EXISTS and OUTER JOINs 332
15.4 EXISTS() and Quantifiers 333
15.5 EXISTS() and Referential Constraints 334
15.6 EXISTS and Three-Valued Logic 335
16 Quantified Subquery Predicates 338
16.1 Scalar Subquery Comparisons 339
16.2 Quantifiers and Missing Data 340
16.3 The ALL Predicate and Extrema Functions 342
16.4 The UNIQUE Predicate 343
17 The SELECT Statement 346
17.1 SELECT and JOINs 346
17.1.1 One-Level SELECT Statement 346
17.1.2 Correlated Subqueries in a SELECT Statement 353
17.1.3 SELECT Statement Syntax 355
17.1.4 The ORDER BY Clause 357
17.2 OUTER JOINs 365
17.2.1 Syntax for OUTER JOINs 366
17.2.2 NULLs and OUTER JOINs 371
17.2.3 NATURAL versus Searched OUTER JOINs 373
17.2.4 Self OUTER JOINs 374
17.2.5 Two or More OUTER JOINs 375
17.2.6 OUTER JOINs and Aggregate Functions 377
17.2.7 FULL OUTER JOIN 378
17.2.8 WHERE Clause OUTER JOIN Operators 379
17.3 Old versus New JOIN Syntax 380
17.4 Scope of Derived Table Names 382
17.5 JOINs by Function Calls 383
17.6 The UNION JOIN 385
17.7 Packing Joins 387
17.8 Dr. Codd's T-Join 388
17.8.1 The Croatian Solution 392
17.8.2 The Swedish Solution 393
17.8.3 The Colombian Solution 393
18 VIEWs, Derived Tables, Materialized Tables, and Temporary Tables 398
18.1 VIEWs in Queries 399
18.2 Updatable and Read-Only VIEWs 400
18.3 Types of VIEWs 402
18.3.1 Single-Table Projection and Restriction 402
18.3.2 Calculated Columns 402
18.3.3 Translated Columns 402
18.3.4 Grouped VIEWs 403
18.3.5 UNIONed VIEWs 404
18.3.6 JOINs in VIEWs 406
18.3.7 Nested VIEWs 406
18.4 How VIEWs Are Handled in the Database System 408
18.4.1 View Column List 408
18.4.2 VIEW Materialization 408
18.4.3 In-Line Text Expansion 409
18.4.4 Pointer Structures 411
18.4.5 Indexing and Views 412
18.5 WITH CHECK OPTION Clause 412
18.5.1 WITH CHECK OPTION as CHECK() Clause 417
18.6 Dropping VIEWs 418
18.7 TEMPORARY TABLE Declarations 419
18.8 Hints on Using VIEWs and TEMPORARY TABLEs 420
18.8.1 Using VIEWs 421
18.8.2 Using TEMPORARY TABLEs 421
18.8.3 Flattening a Table with a VIEW 422
18.9 Using Derived Tables 424
18.9.1 Derived Tables in the FROM clause 424
18.9.2 Derived Tables with a VALUES Constructor 426
18.10 Derived Tables in the WITH Clause 426
19 Partitioning Data in Queries 430
19.1 Coverings and Partitions 430
19.1.1 Partitioning by Ranges 431
19.1.2 Partition by Functions 432
19.1.3 Partition by Sequences 433
19.2 Relational Division 435
19.2.1 Division with a Remainder 437
19.2.2 Exact Division 438
19.2.3 Note on Performance 439
19.2.4 Todd's Division 439
19.2.5 Division with JOINs 442
19.2.6 Division with Set Operators 442
19.3 Romley's Division 443
19.4 Boolean Expressions in an RDBMS 447
19.5 FIFO and LIFO Subsets 449
20 Grouping Operations 454
20.1 GROUP BY Clause 454
20.1.1 NULLs and Groups 456
20.2 GROUP BY and HAVING 456
20.2.1 Group Characteristics and the HAVING Clause 458
20.3 Multiple Aggregation Levels 460
20.3.1 Grouped VIEWs for Multiple Aggregation Levels 461
20.3.2 Subquery Expressions for Multiple Aggregation Levels 462
20.3.3 CASE Expressions for Multiple Aggregation Levels 463
20.4 Grouping on Computed Columns 464
20.5 Grouping into Pairs 465
20.6 Sorting and GROUP BY 466
21 Aggregate Functions 468
21.1 COUNT() Functions 469
21.2 SUM() Functions 472
21.3 AVG() Functions 473
21.3.1 Averages with Empty Groups 475
21.3.2 Averages across Columns 477
21.4 Extrema Functions 478
21.4.1 Simple Extrema Functions 478
21.4.2 Generalized Extrema Functions 480
21.4.3 Multiple Criteria Extrema Functions 489
21.4.4 GREATEST() and LEAST() Functions 491
21.5 The LIST() Aggregate Function 494
21.5.1 The LIST() Function with a Procedure 495
21.5.2 The LIST() Function by Crosstabs 496
21.6 The PRD() Aggregate Function 497
21.6.1 PRD() Function by Expressions 498
21.6.2 The PRD() Aggregate Function by Logarithms 499
21.7 Bitwise Aggregate Functions 502
21.7.1 Bitwise OR Aggregate Function 503
21.7.2 Bitwise AND Aggregate Function 504
22 Auxiliary Tables 506
22.1 The Sequence Table 506
22.1.1 Enumerating a List 508
22.1.2 Mapping a Sequence into a Cycle 510
22.1.3 Replacing an Iterative Loop 512
22.2 Lookup Auxiliary Tables 514
22.2.1 Simple Translation Auxiliary Tables 516
22.2.2 Multiple Translation Auxiliary Tables 516
22.2.3 Multiple Parameter Auxiliary Tables 517
22.2.4 Range Auxiliary Tables 518
22.2.5 Hierarchical Auxiliary Tables 519
22.2.6 One True Lookup Table 520
22.3 Auxiliary Function Tables 522
22.3.1 Inverse Functions with Auxiliary Tables 524
22.3.2 Interpolation with Auxiliary Function Tables 533
22.4 Global Constants Tables 535
23 Statistics in SQL 538
23.1 The Mode 539
23.2 The AVG() Function 541
23.3 The Median 541
23.3.1 Date's First Median 542
23.3.2 Celko's First Median 543
23.3.3 Date's Second Median 545
23.3.4 Murchison's Median 545
23.3.5 Celko's Second Median 546
23.3.6 Vaughan's Median with VIEWs 548
23.3.7 Median with Characteristic Function 549
23.3.8 Celko's Third Median 551
23.3.9 Ken Henderson's Median 555
23.4 Variance and Standard Deviation 556
23.5 Average Deviation 557
23.6 Cumulative Statistics 557
23.6.1 Running Totals 558
23.6.2 Running Differences 559
23.6.3 Cumulative Percentages 560
23.6.4 Rankings and Related Statistics 562
23.6.5 Quintiles and Related Statistics 566
23.7 Cross Tabulations 567
23.7.1 Crosstabs by Cross Join 571
23.7.2 Crosstabs by Outer Joins 572
23.7.3 Crosstabs by Subquery 573
23.7.4 Crosstabs by CASE Expression 574
23.8 Harmonic Mean and Geometric Mean 574
23.9 Multivariable Descriptive Statistics in SQL 575
23.9.1 Covariance 575
23.9.2 Pearson's r 576
23.9.3 NULLs in Multivariable Descriptive Statistics 577
24 Regions, Runs, Gaps, Sequences, and Series 578
24.1 Finding Subregions of Size (n) 579
24.2 Numbering Regions 580
24.3 Finding Regions of Maximum Size 581
24.4 Bound Queries 586
24.5 Run and Sequence Queries 586
24.5.1 Filling in Sequence Numbers 589
24.6 Summation of a Series 591
24.7 Swapping and Sliding Values in a List 594
24.8 Condensing a List of Numbers 596
24.9 Folding a List of Numbers 596
24.10 Coverings 597
25 Arrays in SQL 604
25.1 Arrays via Named Columns 605
25.2 Arrays via Subscript Columns 609
25.3 Matrix Operations in SQL 610
25.3.1 Matrix Equality 611
25.3.2 Matrix Addition 611
25.3.3 Matrix Multiplication 612
25.3.4 Other Matrix Operations 614
25.4 Flattening a Table into an Array 614
25.5 Comparing Arrays in Table Format 616
26 Set Operations 620
26.1 UNION and UNION ALL 621
26.1.1 Order of Execution 623
26.1.2 Mixed UNION and UNION ALL Operators 624
26.1.3 UNION of Columns from the Same Table 624
26.2 INTERSECT and EXCEPT 625
26.2.1 INTERSECT and EXCEPT without NULLs and Duplicates 628
26.2.2 INTERSECT and EXCEPT with NULLs and Duplicates 629
26.3 A Note on ALL and SELECT DISTINCT 630
26.4 Equality and Proper Subsets 631
27 Subsets 634
27.1 Every nth Item in a Table 634
27.2 Picking Random Rows from a Table 636
27.3 The CONTAINS Operators 641
27.3.1 Proper Subset Operators 641
27.3.2 Table Equality 642
27.4 Picking a Representative Subset 647
28 Trees and Hierarchies in SQL 652
28.1 Adjacency List Model 653
28.1.1 Complex Constraints 654
28.1.2 Procedural Traversal for Queries 656
28.1.3 Altering the Table 657
28.2 The Path Enumeration Model 657
28.2.1 Finding Subtrees and Nodes 658
28.2.2 Finding Levels and Subordinates 659
28.2.3 Deleting Nodes and Subtrees 659
28.2.4 Integrity Constraints 660
28.3 Nested Set Model of Hierarchies 660
28.3.1 The Counting Property 662
28.3.2 The Containment Property 663
28.3.3 Subordinates 664
28.3.4 Hierarchical Aggregations 665
28.3.5 Deleting Nodes and Subtrees 665
28.3.6 Converting Adjacency List to Nested Set Model 666
28.4 Other Models for Trees and Hierarchies 668
29 Temporal Queries 670
29.1 Temporal Math 671
29.2 Personal Calendars 672
29.3 Time Series 674
29.3.1 Gaps in a Time Series 674
29.3.2 Continuous Time Periods 677
29.3.3 Missing Times in Contiguous Events 681
29.3.4 Locating Dates 685
29.3.5 Temporal Starting and Ending Points 687
29.3.6 Average Wait Times 689
29.4 Julian Dates 690
29.5 Date and Time Extraction Functions 694
29.6 Other Temporal Functions 695
29.7 Weeks 696
29.7.1 Sorting by Weekday Names 698
29.8 Modeling Time in Tables 699
29.8.1 Using Duration Pairs 701
29.9 Calendar Auxiliary Table 702
29.10 Problems with the Year 2000 704
29.10.1 The Zeros 704
29.10.2 Leap Year 705
29.10.3 The Millennium 706
29.10.4 Weird Dates in Legacy Data 708
29.10.5 The Aftermath 709
30 Graphs in SQL 710
30.1 Basic Graph Characteristics 711
30.1.1 All Nodes in the Graph 711
30.1.2 Path Endpoints 712
30.1.3 Reachable Nodes 712
30.1.4 Edges 713
30.1.5 Indegree and Outdegree 713
30.1.6 Source, Sink, Isolated, and Internal Nodes 714
30.2 Paths in a Graph 715
30.2.1 Length of Paths 716
30.2.2 Shortest Path 716
30.2.3 Paths by Iteration 717
30.2.4 Listing the Paths 720
30.3 Acyclic Graphs as Nested Sets 724
30.4 Paths with CTE 726
30.4.1 Nonacyclic Graphs 732
30.5 Adjacency Matrix Model 734
30.6 Points inside Polygons 735
31 OLAP in SQL 738
31.1 Star Schema 739
31.2 OLAP Functionality 740
31.2.1 RANK and DENSE_RANK 740
31.2.2 Row Numbering 740
31.2.3 GROUPING Operators 741
31.2.4 The Window Clause 743
31.2.5 OLAP Examples of SQL 745
31.2.6 Enterprise-Wide Dimensional Layer 746
31.3 A Bit of History 747
32 Transactions and Concurrency Control 748
32.1 Sessions 748
32.2 Transactions and ACID 749
32.2.1 Atomicity 749
32.2.2 Consistency 750
32.2.3 Isolation 750
32.2.4 Durability 751
32.3 Concurrency Control 751
32.3.1 The Five Phenomena 751
32.3.2 The Isolation Levels 753
32.3.3 CURSOR STABILITY Isolation Level 755
32.4 Pessimistic Concurrency Control 755
32.5 SNAPSHOT Isolation: Optimistic Concurrency 756
32.6 Logical Concurrency Control 758
32.7 Deadlock and Livelocks 759
33 Optimizing SQL 760
33.1 Access Methods 761
33.1.1 Sequential Access 761
33.1.2 Indexed Access 761
33.1.3 Hashed Indexes 762
33.1.4 Bit Vector Indexes 762
33.2 Expressions and Unnested Queries 762
33.2.1 Use Simple Expressions 763
33.2.2 String Expressions 767
33.3 Give Extra Join Information in Queries 767
33.4 Index Tables Carefully 769
33.5 Watch the IN Predicate 771
33.6 Avoid UNIONs 773
33.7 Prefer Joins over Nested Queries 774
33.8 Avoid Expressions on Indexed Columns 775
33.9 Avoid Sorting 775
33.10 Avoid CROSS JOINs 779
33.11 Learn to Use Indexes Carefully 780
33.12 Order Indexes Carefully 781
33.13 Know Your Optimizer 783
33.14 Recompile Static SQL after Schema Changes 785
33.15 Temporary Tables Are Sometimes Handy 786
33.16 Update Statistics 789
back matter 790
Appendix: References 790
General References 790
Logic 790
Mathematical Techniques 790
Random Numbers 791
Scales and Measurements 792
Missing Values 792
Regular Expressions 793
Graph Theory 794
Introductory SQL Books 794
Optimizing Queries 795
Temporal Data and the Year 2000 Problem 795
SQL Programming Techniques 797
Classics 797
Forum 798
Updatable Views 798
Theory, Normalization, and Advanced Database Topics 799
Books on SQL-92 and SQL-99 800
Standards and Related Groups 800
Web Sites Related to SQL 801
Statistics 801
Temporal Databases 802
New Citations 803
index 806
About the Author 839

CHAPTER 1
Database Design

THIS CHAPTER DISCUSSES THE DDL (Data Definition Language), which is used to create a database schema. It is related to the next chapter on the theory of database normalization. Most bad queries start with a bad schema. To get data out of the bad schema, you have to write convoluted code, and you are never sure if it did what it was meant to do.

One of the major advantages of databases, relational and otherwise, was that the data could be shared among programs so that an enterprise could use one trusted source for information. Once the data was separated from the programs, we could build tools to maintain, back up, and validate the data in one place, without worrying about hundreds or even thousands of application programs possibly working against each other.

SQL has spawned a whole branch of data modeling tools devoted to designing its schemas and tables. Most of these tools use a graphic or text description of the rules and the constraints on the data to produce a schema declaration statement that can be used directly in a particular SQL product. It is often assumed that a CASE tool will automatically prevent you from creating a bad design. This is simply not true.

Bad schema design leads to weird queries that are trying to work around the flaws. These flaws can include picking the wrong data types, denormalization, and missing or incorrect constraints. As Elbert Hubbard (American author, 1856–1915) put it: “Genius may have its limitations, but stupidity is not thus handicapped.”

1.1 Schema and Table Creation

The major problem in learning SQL is that programmers are used to thinking in terms of files rather than tables.

Programming languages are usually based on some underlying model; if you understand the model, the language makes much more sense. For example, FORTRAN is based on algebra. This does not mean that FORTRAN is exactly like algebra. But if you know algebra, FORTRAN does not look all that strange to you. You can write an expression in an assignment statement or make a good guess as to the names of library functions you have never seen before.

Programmers are used to working with files in almost every other programming language. The design of files was derived from paper forms; they are very physical and very dependent on the host programming language. A COBOL file could not easily be read by a FORTRAN program, and vice versa. In fact, it was hard to share files even among programs written in the same programming language!

The most primitive form of a file is a sequence of records, ordered within the file and referenced by physical position. You open a file, then read a first record, followed by a series of next records until you come to the last record to raise the end-of-file condition. You navigate among these records and perform actions one record at a time. The actions you take on one file have no effect on other files that are not in the same program. Only programs can change files.

The model for SQL is data kept in sets, not in physical files. The “unit of work” in SQL is the whole schema, not individual tables.

Sets are those mathematical abstractions you studied in school. Sets are not ordered, and the members of a set are all of the same type. When you perform an operation on a set, the action happens “all at once” to the entire membership of the set. That is, if I ask for the subset of odd numbers from the set of positive integers, I get all of them back as a single set. I do not build the set of odd numbers by sequentially inspecting one element at a time. I define odd numbers with a rule “If the remainder is 1 when you divide the number by 2, it is odd” that could test any integer and classify it. Parallel processing is one of many, many advantages of having a set-oriented model.

SQL is not a perfect set language any more than FORTRAN is a perfect algebraic language, as we will see. But if you are in doubt about something in SQL, ask yourself how you would specify it in terms of sets, and you will probably get the right answer.

1.1.1 CREATE SCHEMA Statement

A CREATE SCHEMA statement, defined in the SQL Standard, brings an entire schema into existence all at once. In practice, each product has very different utility programs to allocate physical storage and define a schema. Much of the proprietary syntax is concerned with physical storage allocations.

A schema must have a name and a default character set, usually ASCII or a simple Latin alphabet as defined in the ISO Standards. There is an optional AUTHORIZATION clause that holds a <schema authorization identifier> for security. After that the schema is a list of schema elements:

A schema is the skeleton of an SQL database; it defines the structures of the schema objects and the rules under which they operate. The data is the meat on that skeleton.

The only data structure in SQL is the table. Tables can be persistent (base tables), used for working storage (temporary tables), or virtual (VIEWs, common table expressions, and derived tables). The differences among these types are in implementation, not performance. One advantage of having only one data structure is that the results of all operations are also tables, you never have to convert structures, write special operators, or deal with any irregularity in the language.

The <grant statement> has to do with limiting user access to certain schema elements. The <assertion definition> is not widely implemented yet, but it works as a constraint that applies to the schema as awhole. Finally, the <character set definition>, <collation definition>, and <translation definition> deal with the display of data. We are not really concerned with any of these schema objects; they are usually set in place by the DBA (database administrator) for the users, and we mere programmers do not get to change them.

Conceptually, a table is a set of zero or more rows, and a row is a set of one or more columns. Each column has a specific data type and constraints that make up an implementation of an abstract domain. The way a table is physically implemented does not matter, because you only access it with SQL. The database engine handles all the details for you and you never worry about the internals, as you would with a physical file.

In fact, almost no two SQL products use the same internal structures. SQL Server uses physically contiguous storage accessed by two kinds of indexes; Teradata uses hashing; Nucleus (SAND Technology) uses compressed bit vector; Informix and CA-Ingres use more than a dozen different kinds of indexes.

There are two common conceptual errors made by programmers who are accustomed to file systems or PCs. The first is thinking that a table is a file; the second is thinking that a table is a spreadsheet. Tables do not behave like either, and you will get surprises if you do not understand the basic concepts.

It is easy to imagine that a table is a file, a row is a record, and a column is a field. This concept is familiar, and when data moves from SQL to the host language, it must be converted into host language data types and data structures to be displayed and used.

The big differences between working with a file system and working with SQL are in the way SQL fits into a host program. If you are using a file system, your programs must open and close files individually. In SQL, the whole schema is connected to or disconnected from the program as a single unit. The host program might not be authorized to see or manipulate all of the tables and other schema objects, but that is established as part of the connection.

The program defines fields within a file, whereas SQL defines its columns in the schema. FORTRAN uses the FORMAT and READ statements to get data from a file. Likewise, a COBOL program uses a Data Division to define the fields and a READ to fetch it. And so it goes for every 3GL’s programming; the concept is the same, though the syntax and options vary.

A file system lets you reference the same data by a different name in each program. If a file’s layout changes, you must rewrite all the programs that use that file. When a file is empty, it looks exactly like all other empty files. When you try to read an empty file, the EOF (end of file) flag pops up and the program takes some action. Column names and data types in a table are defined within the database schema. Within reasonable limits, the tables can be changed without the knowledge of the host program.

The host program only worries about transferring the values to its own variables from the database. Remember the empty set from your high school math class? It is still a valid set. When a table is empty, it still has columns, but has zero rows. There is no EOF flag to signal an exception, because there is no final record.

Another major difference is that tables and columns can have constraints attached to them. A constraint is a rule that defines what must be true about the database after each transaction. In this sense, a database is more like a collection of objects than a traditional passive file system.

A table is not a spreadsheet, even though they look very similar when you view them on a screen or in a printout. In a spreadsheet you can access a row, a column, a cell, or a collection of cells by navigating with a cursor. A table has no concept of navigation. Cells in a spreadsheet can store instructions, not just data. There is no real difference between a row and column in a spreadsheet; you could flip them around completely and...

Erscheint lt. Verlag	26.7.2010
Sprache	englisch
Themenwelt	Sachbuch/Ratgeber
	Mathematik / Informatik ► Informatik ► Datenbanken
	Mathematik / Informatik ► Informatik ► Programmiersprachen / -werkzeuge
	Informatik ► Software Entwicklung ► User Interfaces (HCI)
ISBN-10	0-08-046004-6 / 0080460046
ISBN-13	978-0-08-046004-8 / 9780080460048

Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?

EPUB (Adobe DRM)

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Print-Ausgabe

Buch | Softcover

51,10 €