Joe Celko's Thinking in Sets: Auxiliary, Temporal, and Virtual Tables in SQL - Joe Celko

Joe Celko's Thinking in Sets: Auxiliary, Temporal, and Virtual Tables in SQL (eBook)

Joe Celko (Autor)

eBook Download: EPUB

2008 | 1. Auflage
384 Seiten
Elsevier Science (Verlag)
978-0-08-055752-6 (ISBN)

Perfectly intelligent programmers often struggle when forced to work with SQL. Why? Joe Celko believes the problem lies with their procedural programming mindset, which keeps them from taking full advantage of the power of declarative languages. The result is overly complex and inefficient code, not to mention lost productivity.

This book will change the way you think about the problems you solve with SQL programs.. Focusing on three key table-based techniques, Celko reveals their power through detailed examples and clear explanations. As you master these techniques, you'll find you are able to conceptualize problems as rooted in sets and solvable through declarative programming. Before long, you'll be coding more quickly, writing more efficient code, and applying the full power of SQL

. Filled with the insights of one of the world's leading SQL authorities - noted for his knowledge and his ability to teach what he knows.

. Focuses on auxiliary tables (for computing functions and other values by joins), temporal tables (for temporal queries, historical data, and audit information), and virtual tables (for improved performance).

. Presents clear guidance for selecting and correctly applying the right table technique.

Perfectly intelligent programmers often struggle when forced to work with SQL. Why? Joe Celko believes the problem lies with their procedural programming mindset, which keeps them from taking full advantage of the power of declarative languages. The result is overly complex and inefficient code, not to mention lost productivity.This book will change the way you think about the problems you solve with SQL programs.. Focusing on three key table-based techniques, Celko reveals their power through detailed examples and clear explanations. As you master these techniques, you'll find you are able to conceptualize problems as rooted in sets and solvable through declarative programming. Before long, you'll be coding more quickly, writing more efficient code, and applying the full power of SQL- Filled with the insights of one of the world's leading SQL authorities - noted for his knowledge and his ability to teach what he knows- Focuses on auxiliary tables (for computing functions and other values by joins), temporal tables (for temporal queries, historical data, and audit information), and virtual tables (for improved performance)- Presents clear guidance for selecting and correctly applying the right table technique

Front Cover 1
Joe Celko's Thinking in Sets 2
Copyright Page 5
CONTENTS 8
Preface 18
Chapter 1. SQL is Declarative, Not Procedural 22
1.1 Different Programming Models 23
1.2 Different Data Models 25
1.3 Tables as Entities 40
1.4 Tables as Relationships 41
1.5 Statements Are Not Procedures 41
1.6 Molecular, Atomic, and Subatomic Data Elements 42
Chapter 2. Hardware, Data Volume, and Maintaining Databases 50
2.1 Parallelism 51
2.2 Cheap Main Storage 52
2.3 Solid-State Disk 53
2.4 Cheaper Secondary and Tertiary Storage 53
2.5 The Data Changed 54
2.6 The Mindset Has Not Changed 54
Chapter 3. Data Access and Records 58
3.1 Sequential Access 59
3.2 Indexes 60
3.3 Hashing 62
3.4 Bit Vector Indexes 65
3.5 Parallel Access 65
3.6 Row and Column Storage 65
3.7 JOIN Algorithms 67
Chapter 4. Lookup Tables 72
4.1 Data Element Names 73
4.2 Multiparameter Lookup Tables 76
4.3 Constants Table 77
4.4 OTLT or MUCK Table Problems 80
4.5 Definition of a Proper Table 83
Chapter 5. Auxiliary Tables 86
5.1 Sequence Table 86
5.2 Permutations 93
5.3 Functions 96
5.4 Encryption via Tables 99
5.5 Random Numbers 100
5.6 Interpolation 104
Chapter 6. Views 108
6.1 Mullins VIEW Usage Rules 109
6.2 Updatable and Read-Only VIEWs 112
6.3 Types of VIEWs 114
6.4 Modeling Classes with Tables 121
6.5 How VIEWs Are Handled in the Database System 124
6.6 In-Line Text Expansion 126
6.7 WITH CHECK OPTION Clause 127
6.8 Dropping VIEWs 133
6.9 Outdated Uses for VIEWs 134
Chapter 7. Virtual Tables 138
7.1 Derived Tables 139
7.2 Common Table Expressions 145
7.3 Temporary Tables 149
7.4 The Information Schema 150
Chapter 8. Complicated Functions via Tables 158
8.1 Functions without a Simple Formula 158
8.2 Check Digits via Tables 160
8.3 Classes of Algorithms 162
8.4 Declarations, Not Functions, Not Procedures 169
8.5 Data Mining for Auxiliary Tables 173
Chapter 9. Temporal Tables 176
9.1 The Nature of Time 176
9.2 The ISO Half-Open Interval Model 180
9.3 State Transition Tables 195
9.4 Consolidating Intervals 199
9.5 Calendar Tables 203
9.6 History Tables 209
Chapter 10. Scrubbing Data with Non-1NF Tables 212
10.1 Repeated Groups 213
10.2 Designing Scrubbing Tables 219
10.3 Scrubbing Constraints 222
10.4 Calendar Scrubs 223
10.5 String Scrubbing 225
10.6 Sharing SQL Data 226
10.7 Extract, Transform, and Load Products 229
Chapter 11. Thinking in SQL 236
11.1 Warm-up Exercises 237
11.2 Heuristics 241
11.3 Do Not Use BIT or BOOLEAN Flags in SQL 249
Chapter 12. Group Characteristics 256
12.1 Grouping Is Not Equality 258
12.2 Using Groups without Looking Inside 259
12.3 Grouping over Time 263
12.4 Other Tricks with HAVING Clauses 266
12.5 Groupings, Rollups, and Cubes 268
12.6 The WINDOW Clause 271
Chapter 13. Turning Specifications into Code 276
13.1 Signs of Bad SQL 276
13.2 Methods of Attack 281
13.3 Translating Vague Specifications 286
Chapter 14. Using Procedure and Function Calls 294
14.1 Clearing out Spaces in a String 294
14.2 The PRD( ) Aggregate Function 301
14.3 Long Parameter Lists in Procedures and Functions 303
Chapter 15. Numbering Rows 308
15.1 Procedural Solutions 308
15.2 OLAP Functions 312
15.3 Sections 314
Chapter 16. Keeping Computed Data 318
16.1 Procedural Solution 318
16.2 Relational Solution 320
16.3 Other Kinds of Computed Data 320
Chapter 17. Triggers for Constraints 322
17.1 Triggers for Computations 322
17.2 Complex Constraints via CHECK() and CASE Constraints 323
17.3 Complex Constraints via VIEWs 326
17.4 Operations on VIEWs as Constraints 329
Chapter 18. Procedural and Data Driven Solutions 344
18.1 Removing Letters in a String 344
18.2 Two Approaches to Sudoku 347
18.3 Data Constraint Approach 352
18.4 Bin Packing Problems 356
18.5 Inventory Costs over Time 360
Index 370

SQL Is Declarative,
Not Procedural

IN THE PREFACE I told a short story about FORTRAN programmers who could only solve problems using loops and a LISP programmer who could only solve problems recursively This is not uncommon because we love the tools we know. Let me tell a joke instead of a story: A mathematician, a physicist, and a database programmer were all given a rubber ball and told to find the volume.

The mathematician carefully measured the diameter and either evaluated the volume of sphere formula or used a triple integral if the ball was not perfectly round.

The physicist filled a beaker with water, put the ball in the water, and measured the total displacement. He does not care about the details of the shape of the ball.

The database programmer looked up the model and serial numbers in his rubber ball manufacturer’s on-line database. He does not care about the actual ball. But he has information about the tolerances to which it was made, the expected shape and size, and a bunch of other things that apply to the entire rubber ball production process.

The moral of the story is: The mathematician knows how to compute. The physicist knows how to measure. The database guy knows how to look up data. Each person grabs his tools to solve the problem.

Now change the problem to an inventory of thousands of rubber balls. The mathematician and the physicist are stuck with a lot of manual labor. The database guy does a few downloads and he can produce rubber ball industry standards (assuming that there are such things) and detailed documentation in court with his answers.

1.1 Different Programming Models

Perfecting oneself is as much unlearning as it is learning.
—Edsgar Dijkstra

There are many models of programming. Procedural programming languages use a sequence of procedural steps guided by flow of control statements (WHILE-DO, IF-THEN-ELSE, and BEGIN-END) that change the input data to output data. This was the traditional view of programming, and it is often called the von Neumann Model after John von Neumann, the mathematician who was responsible for it. The same source code runs through the same compiler and generates the same executable module every time. The same program will work exactly the same way every time it is invoked. The keywords in this model are predictable and deterministic. It is also subject to some mathematical analysis because it is deterministic.

There are some variations on the theme. Some languages use different flow control statements. FORTRAN and COBOL allocated all the storage for the data at the start of the program. Later, the Algol family of languages did dynamic storage allocation based on the scope of the data within a block-structured language.

Edsgar Dijkstra (see his archives at www.cs.utexas.edu/users/EWD/) came up with a language that was nondeterministic. Statements, called guarded commands, have a control that either blocks or allows the statement to be executed, but there is no particular order of execution among the open statements. This model was not implemented in a commercial product, but it demonstrated that something we had thought was necessary for programming (determinism) could be dropped.

Functional programming languages are based on solving problems as a series of nested function calls. The concept of higher-order functions to change one function to another is important in these languages. The derivative and integral transforms are mathematical examples of such higher-order functions. One of the goals of such languages is to avoid a side effect in programs so they can be optimized algebraically In particular, once you have an expression that is equal to another (in some sense of equality), they can substitute for each other without affecting the result of the computation.

APL is the most successful functional programming language and had a fad period as a teaching language when Ken Iverson wrote his book A Programming Language in 1962. IBM produced special keyboards that included the obscure mathematical symbols used in APL for their desktop machines. Most of the functional languages never made it out of academia, but some survive in commercial applications today. Erlang is used for concurrent applications; R is a statistical language; Mathematica is a popular symbolic mathematics product; and Kx Systems uses the K language for large-volume financial analysis. More recently the ML and Haskell programming languages have become popular among Linux and UNIX programmers.

Here we dropped another concept that had been regarded as fundamental: There is no flow of control in these languages.

Constraint or constraint logic programming languages are a series of constraints on a problem domain. As you add more constraints, the system figures out which answers are possible and which are not. The most popular such language is PROLOG, which also had an academic fad many years ago when Borland Software (www.borland.com) made a cheap student version available. The website ON-LINE GUIDE TO CONSTRAINT PROGRAMMING by Roman Barták is a good place to start if you are interested in this topic (http://kti.ms.mff.cuni.cz/~bartak/constraints/index.html).

Here we dropped the concept of an algorithm altogether and just provided a problem specification.

Object-oriented (OO) programming is based on the ideas of objects that have both data and behavior in the same module of code. The programming model is a collection of independent cooperating objects instead of a single program invoking functions. An object is capable of receiving messages, processing data, and sending messages to other objects.

The idea is that each object can be maintained and written independently of any particular application and dropped into place where it is needed. Imagine a community of people who do particular jobs. They receive orders from their customers, process them, and return a result.

Many years ago, the INCITS H2 Database Standards Committee (née ANSI X3H2 Database Standards Committee) had a meeting in Rapid City, South Dakota. We had Mount Rushmore and Bjarne Stroustrup as special attractions. Mr. Stroustrup did his slide show with overhead transparencies (yes, this was before PowerPoint was ubiquitous!) about Bell Labs inventing C++ and OO programming, and we got to ask questions.

One of the questions was how we should put OO features into the working model of the next version of the SQL standard, which was known as SQL3 internally. His answer was that Bell Labs, with all their talent, had tried four different approaches to this problem and they came to the conclusion that it should not be done. OO was great for programming but deadly for data.

I have watched people try to force OO models into SQL, and it falls apart in about a year. Every typo becomes a new attribute or class, queries that would have been so easy in a relational model are now multitable monster outer joins, redundancy grows at an exponential rates, constraints are virtually impossible to write so you can kiss data integrity goodbye, and so forth.

With all these programming models, why should we not have different data models?

1.2 Different Data Models

Consider the humble punch card. Punch cards had been used in France to control textile looms since the early 1700s; the method was perfected by Joseph Marie Jacquard in 1801 with his Jacquard loom.

Flash forward to the year 1890, when a man named Herman Hollerith invented a punch card and tabulating machines for that year’s United States Census. His census project was so successful that Mr. Hollerith left the government and started the Tabulating Machine Company in 1896. After a series of mergers and name changes, this company became IBM. You might have heard of it.

Up to the 1970s, the “IBM card” and related machinery was everywhere. The most common card was the IBM 5081, and that part number became the common term for it—even across vendors! The punch card was data processing back then.

The physical characteristics of the card determined how we stored and processed data for decades afterwards. The card was the size of an 1887 United States dollar bill (3.25 inches by 7.375 inches). The reason for that size was simple; when Hollerith worked on the Census, he could get drawers to store the decks of cards from the Department of the Treasury across the street.

The cards had a grid of 80 columns of 12 rows, which could accommodate holes. This was for physical reasons again. But once the 80-column convention was established, it stuck. The early video terminals that replaced the key punch machines used screens with 80 columns of text and 24 or 25 rows—that is, two punch cards high and possibly a line for error messages.

Magnetic tapes started replacing punch cards in the 1970s, but they also mimicked the 80-column convention, although there was no longer any need. Many of the early ANSI tape standards for header records are based on this convention. Legacy systems simply replaced card readers with magnetic tape units for obvious reasons, but new applications continued to be built to this standard, too.

The physical nature of the cards meant that data was written and read from left to right in sequential order. Likewise, the deck of cards was written and read from front to back in sequential order.

A magnetic tape file is also written and read in the same way, but with the added bonus that when you drop a tape on the floor, it does not get scrambled like a deck of cards. The downside of a tape over a deck of cards is that it cannot be rearranged manually on purpose...

Erscheint lt. Verlag	22.1.2008
Sprache	englisch
Themenwelt	Sachbuch/Ratgeber
	Mathematik / Informatik ► Informatik ► Datenbanken
	Mathematik / Informatik ► Informatik ► Programmiersprachen / -werkzeuge
ISBN-10	0-08-055752-X / 008055752X
ISBN-13	978-0-08-055752-6 / 9780080557526

Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?

EPUB (Adobe DRM)

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Andere Ausgabe

Buch | Softcover (2008)

31,15 €