Data Management and Query Processing in Semantic Web Databases - Sven Groppe

Blick ins Buch

Data Management and Query Processing in Semantic Web Databases (eBook)

Sven Groppe (Autor)

eBook Download: PDF

2011 | 2011
IX, 270 Seiten
Springer Berlin (Verlag)
978-3-642-19357-6 (ISBN)

Lese- und Medienproben

Ebook-Leseprobe (PDF)

The Semantic Web, which is intended to establish a machine-understandable Web, is currently changing from being an emerging trend to a technology used in complex real-world applications. A number of standards and techniques have been developed by the World Wide Web Consortium (W3C), e.g., the Resource Description Framework (RDF), which provides a general method for conceptual descriptions for Web resources, and SPARQL, an RDF querying language. Recent examples of large RDF data with billions of facts include the UniProt comprehensive catalog of protein sequence, function and annotation data, the RDF data extracted from Wikipedia, and Princeton University's WordNet. Clearly, querying performance has become a key issue for Semantic Web applications.

In his book, Groppe details various aspects of high-performance Semantic Web data management and query processing. His presentation fills the gap between Semantic Web and database books, which either fail to take into account the performance issues of large-scale data management or fail to exploit the special properties of Semantic Web data models and queries. After a general introduction to the relevant Semantic Web standards, he presents specialized indexing and sorting algorithms, adapted approaches for logical and physical query optimization, optimization possibilities when using the parallel database technologies of today's multicore processors, and visual and embedded query languages.

Groppe primarily targets researchers, students, and developers of large-scale Semantic Web applications. On the complementary book webpage readers will find additional material, such as an online demonstration of a query engine, and exercises, and their solutions, that challenge their comprehension of the topics presented.

Sven Groppe is a research assistant in the Institute of Information Systems at the University of Lübeck, Germany. He holds a Ph.D. from the University of Paderborn on query reformulation for the XML standards XPath, XSLT and XQuery. He was a member of the DAWG W3C Working Group that developed the W3C standard query language SPARQL. His research focus is on large-scale Semantic Web data management and its application in e-business scenarios.

Sven Groppe is a research assistant in the Institute of Information Systems at the University of Lübeck, Germany. He holds a Ph.D. from the University of Paderborn on query reformulation for the XML standards XPath, XSLT and XQuery. He was a member of the DAWG W3C Working Group that developed the W3C standard query language SPARQL. His research focus is on large-scale Semantic Web data management and its application in e-business scenarios.

Contents 6
Chapter 1: Introduction 12
1.1 Main Target Group of the Book 13
1.2 Prerequisites Needed to Understand the Book 14
1.3 Content 14
1.4 Logical Organization of the Book 15
1.5 Structure of the Chapters and Book Webpage 15
Chapter 2: Semantic Web 17
2.1 Introduction 17
2.2 Overview 18
2.3 RDF Data 19
2.3.1 N3 Notation 21
2.3.2 RDF/XML 23
2.4 Ontology Languages 23
2.5 Open World Assumption 26
2.6 No Unique Name Assumption 27
2.7 SPARQL Query Language 27
2.7.1 Language Constructs of SPARQL 28
2.7.1.1 Types of SPARQL Queries 29
2.7.1.2 Default Graph and Named Graphs 30
2.7.1.3 Other Modifiers 31
2.7.1.4 Variables and Blank Nodes 31
2.7.1.5 Triple Patterns 32
2.7.1.6 Filter 32
2.7.1.7 Built-In Functions 33
2.7.1.8 Optional 33
2.7.1.9 Union 34
2.7.2 SPARQL Protocol for RDF 34
2.7.3 SPARQL Query Results XML Format 36
2.7.4 RDF Stores 37
2.8 Rules 38
2.9 Related Work 41
2.9.1 RIF Processing 41
2.9.2 Optimizations for Recursive Rules 43
2.10 Summary and Conclusions 44
Chapter 3: External Sorting and B+-Trees 45
3.1 Motivation 45
3.2 B+-trees 46
3.2.1 Properties of B+-Trees 47
3.2.2 Self-balancing Property of B+-Trees 48
3.2.3 Searching 49
3.2.4 Prefix Search in Combination with Sideways Information Passing 49
3.2.5 Inserting 51
3.2.6 Deleting 53
3.2.7 B+-Tree Construction from a large Dataset 55
3.3 Heap 55
3.4 (External) Merge Sort 57
3.5 Replacement Selection 58
3.6 External Chunks Merge Sort 60
3.7 Distribution Sort 62
3.8 RDF Distribution Sort 63
3.9 Experimental Analysis 66
3.9.1 SP2B Dataset 67
3.9.2 Yago Dataset 68
3.10 Summary and Conclusions 73
Chapter 4: Query Processing Overview 76
4.1 The LUPOSDATE System 76
4.2 Phases of Query Processing 78
4.3 CoreSPARQL 82
4.3.1 Defining CoreSPARQL 82
4.3.2 Transforming SPARQL Queries into CoreSPARQL Queries 83
4.3.3 CoreSPARQL Grammar 86
4.4 Related Work 87
4.5 Summary and Conclusions 87
Chapter 5: Logical Optimization 88
5.1 Logical Algebra 88
5.1.1 Semantics of the Logical Algebra Operators 90
5.2 Logical Optimization Rules 94
5.2.1 Pushing FILTER Operators 94
5.2.2 Splitting and Commutativity of FILTER Operators 96
5.2.3 Constant and Variable Propagation 96
5.2.4 Heuristic Query Optimization Using Equivalency Rules 98
5.2.5 Cost-Based Optimization 99
5.2.5.1 Heuristic Approaches to Join Order Optimization 100
5.2.5.2 Enumeration of Plans 103
Branch and Bound Plan Enumeration 104
Hill Climbing 104
Dynamic Programming 104
Selinger-Style Optimization 106
Example of a Query Optimizer Used in Practice 107
5.2.6 Histograms 108
5.3 Further Related Work 110
5.4 Summary and Conclusions 110
Chapter 6: Physical Optimization 112
6.1 Motivation 113
6.2 Related Work 115
6.3 Indexing 117
6.3.1 Building In-Memory Indices 118
6.3.2 Building Disk-Based Indices 119
6.3.2.1 Dictionary Indices 119
6.3.2.2 Evaluation Indices 120
6.3.2.3 Histogram Indices 122
6.4 Pipelining Versus Materialization 125
6.4.1 Pipeline-Breaker 125
6.4.2 Sideways Information Passing 125
6.5 Join Algorithms 126
6.5.1 Nested-Loop Join 126
6.5.1.1 Iterator Version 127
6.5.1.2 Block-Based Nested-Loop Join 127
6.5.2 Merge Join 129
6.5.2.1 Merge Join and Sideways Information Passing 131
6.5.3 Index Join 131
6.5.4 Hash Join 132
6.5.4.1 Hash Join and Sideways Information Passing 134
6.6 Dynamically Restricting Triple Patterns 135
6.7 Sorting Numbering Scheme 138
6.7.1 Joins Without Presorting Numbers 138
6.7.2 Joins with Presorting Numbers 140
6.7.3 Optimization of Fast Sorting 141
6.7.4 Sorting for Complex Joins 141
6.7.5 Additional Benefits from SIP Strategies 144
6.8 Optional 145
6.8.1 MergeOptional 145
6.9 Duplicate Elimination 146
6.9.1 Duplicate Elimination Using Hashing 146
6.9.2 Duplicate Elimination Using Sorting 147
6.9.3 Duplicate Elimination Using Presorting Numbers 147
6.10 Cost Model 147
6.11 Performance Evaluation 148
6.11.1 Performance Evaluation for In-memory Databases 148
6.11.1.1 Index Construction Time 149
6.11.1.2 Query Evaluation 149
6.11.2 Performance Evaluation for Large-Scale Datasets 154
6.11.2.1 UniProt 155
6.11.2.2 Billion Triples Challenge 158
6.11.2.3 Performance Gains 161
6.12 Summary and Conclusions 161
Chapter 7: Streams 163
7.1 Introduction 163
7.2 eBay 164
7.3 Monitoring eBay Auctions 165
7.3.1 Monitoring System 165
7.3.2 Demonstration 166
7.3.3 Streaming SPARQL Engine 167
7.4 Special Operators for Stream Processing 168
7.4.1 Types of Stream Operators 168
7.4.2 Types of Window Operators 169
7.5 Related Work 169
7.5.1 Data Streams in General 169
7.5.2 Semantic Web Data Streams 170
7.6 Summary and Conclusions 170
Chapter 8: Parallel Databases 171
8.1 Motivation 171
8.2 Types of Parallelisms 173
8.3 Amdahl´s Law 175
8.4 Parallel Monitors and Bounded Buffers 176
8.5 Parallel Join Using a Distribution Thread 176
8.6 Parallel Merge Join Using Partitioned Input 177
8.7 Parallel Computation of Operands 180
8.8 Performance Evaluation 181
8.9 Performance Gains and Loss 183
8.10 Summary and Conclusions 183
Chapter 9: Inference 184
9.1 Introduction 184
9.2 RDF Schema Inference Rules 185
9.3 Materialization of Inference and Consequences for Query Optimization 186
9.4 Logical Optimization for Inference 187
9.5 Performance Analysis 194
9.6 Related Work 196
9.7 Summary and Conclusions 196
Chapter 10: Visual Query Languages 197
10.1 Motivation 197
10.2 Related Work 199
10.3 RDF Visual Editor 200
10.4 SPARQL Visual Editor 200
10.5 Browser-Like Query Creation 200
10.6 Generating Condensed Data View 202
10.7 Refining Queries 203
10.8 Query Formulation Demo 204
10.9 Computation of Suggested Triple Patterns for Query Refinement 205
10.10 Summary and Conclusions 207
Chapter 11: Embedded Languages 208
11.1 Motivation 208
11.2 Related Work 209
11.3 Embedding Semantic Web Languages Into JAVA 210
11.3.1 The Type System 213
11.3.2 Subtype Test 215
11.3.3 Satisfiability Test of Embedded SPARQL and SPARUL Queries 220
11.3.4 Determination of the Query Result Types 222
11.4 Summary and Conclusions 222
Chapter 12: Comparison of the XML and Semantic Web Worlds 223
12.1 Introduction 223
12.2 Concepts and Visions 225
12.3 Data Models 225
12.4 Schema and Ontology Languages 226
12.5 Query Languages 227
12.6 Embedding SPARQL into XQuery/XSLT 230
12.6.1 Embedded SPARQL 230
12.6.2 Translation Process 233
12.6.2.1 Integration of RDF Data into XML 233
12.6.2.2 Physical Operators Formulated in XQuery/XSLT 233
XQuery 235
XSLT 238
12.6.3 Experimental Analysis 239
12.7 Embedding XPath Into SPARQL 244
12.7.1 Translation of XPath Subqueries Into SPARQL Queries 245
12.7.1.1 Translation of Data 246
12.7.1.2 Translation of Queries 248
12.7.1.3 Translation of Result 251
12.7.2 Performance Analysis 251
12.8 Related Work 252
12.9 Summary and Conclusions 254
Chapter 13: Summary, Conclusions, and Future Work 255
13.1 Possibilities for Future Work 256
References 258
Index 270

Erscheint lt. Verlag	29.4.2011
Zusatzinfo	IX, 270 p.
Verlagsort	Berlin
Sprache	englisch
Themenwelt	Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik
Schlagworte	Database Management • query optimization • query processing • RDF • semantic web • SPARQL • stream data processing
ISBN-10	3-642-19357-9 / 3642193579
ISBN-13	978-3-642-19357-6 / 9783642193576

Haben Sie eine Frage zum Produkt?

PDF (Wasserzeichen)
Größe: 12,1 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Print-Ausgabe

Buch | Hardcover

53,49 €