Linked Data - Sherif Sakr, Marcin Wylot, Raghava Mutharaju, Danh Le Phuoc, Irini Fundulaki

Blick ins Buch

Linked Data (eBook)

Storing, Querying, and Reasoning

Sherif Sakr, Marcin Wylot, Raghava Mutharaju, Danh Le Phuoc, Irini Fundulaki (Autoren)

eBook Download: PDF

2018 | 1. Auflage
XX, 236 Seiten
Springer-Verlag
978-3-319-73515-3 (ISBN)

Lese- und Medienproben

Ebook-Leseprobe (PDF)

This book describes efficient and effective techniques for harnessing the power of Linked Data by tackling the various aspects of managing its growing volume: storing, querying, reasoning, provenance management and benchmarking.

To this end, Chapter 1 introduces the main concepts of the Semantic Web and Linked Data and provides a roadmap for the book. Next, Chapter 2 briefly presents the basic concepts underpinning Linked Data technologies that are discussed in the book. Chapter 3 then offers an overview of various techniques and systems for centrally querying RDF datasets, and Chapter 4 outlines various techniques and systems for efficiently querying large RDF datasets in distributed environments. Subsequently, Chapter 5 explores how streaming requirements are addressed in current, state-of-the-art RDF stream data processing. Chapter 6 covers performance and scaling issues of distributed RDF reasoning systems, while Chapter 7 details benchmarks for RDF query engines and instance matching systems. Chapter 8 addresses the provenance management for Linked Data and presents the different provenance models developed. Lastly, Chapter 9 offers a brief summary, highlighting and providing insights into some of the open challenges and research directions.

Providing an updated overview of methods, technologies and systems related to Linked Data this book is mainly intended for students and researchers who are interested in the Linked Data domain. It enables students to gain an understanding of the foundations and underpinning technologies and standards for Linked Data, while researchers benefit from the in-depth coverage of the emerging and ongoing advances in Linked Data storing, querying, reasoning, and provenance management systems. Further, it serves as a starting point to tackle the next research challenges in the domain of Linked Data management.

Sherif Sakr is a professor of computer and information science in the Health Informatics department at King Saud bin Abdulaziz University for Health Sciences, and is also affiliated with the University of New South Wales and DATA61/CSIRO in Australia. Sherif's research interests revolve around the areas of efficient and scalable Big Data Management, Processing and Analytics. In 2013, he was awarded the Stanford Innovation and Entrepreneurship Certificate.

Marcin Wylot is a postdoctoral researcher at TU Berlin, Germany, in the Open Distributed Systems group. His main research interests are in database systems for Semantic Web data, provenance in Linked Data, Internet of Things, and Big Data processing.

Raghava Mutharaju is a research scientist in the AI & Machine Learning Systems division of GE Global Research in Niskayuna, NY, USA. His research interests are in ontology modeling and reasoning, scalable SPARQL query processing, Big Data, Semantic Web and its applications.

Danh Le Phuoc is a Marie Sklodowaka-Curie Fellow at TU Berlin. He is working on Pervasive Analytics which includes Linked Data/Semantic Web, Pervasive Computing, Future Internet and Big Data for Internet of Everything.

Irini Fundulaki is a Principal Researcher at the Institute of Computer Science of the Foundation for Research and Technology-Hellas. Her research interests are related to Web Data Management and more specifically the development of benchmarks for RDF engines, instance matching and link discovery systems, and the management of provenance for Linked Data.

Sherif Sakr is a professor of computer and information science in the Health Informatics department at King Saud bin Abdulaziz University for Health Sciences, and is also affiliated with the University of New South Wales and DATA61/CSIRO in Australia. Sherif’s research interests revolve around the areas of efficient and scalable Big Data Management, Processing and Analytics. In 2013, he was awarded the Stanford Innovation and Entrepreneurship Certificate.Marcin Wylot is a postdoctoral researcher at TU Berlin, Germany, in the Open Distributed Systems group. His main research interests are in database systems for Semantic Web data, provenance in Linked Data, Internet of Things, and Big Data processing.Raghava Mutharaju is a research scientist in the AI & Machine Learning Systems division of GE Global Research in Niskayuna, NY, USA. His research interests are in ontology modeling and reasoning, scalable SPARQL query processing, Big Data, Semantic Web and its applications.Danh Le Phuoc is a Marie Sklodowaka-Curie Fellow at TU Berlin. He is working on Pervasive Analytics which includes Linked Data/Semantic Web, Pervasive Computing, Future Internet and Big Data for Internet of Everything.Irini Fundulaki is a Principal Researcher at the Institute of Computer Science of the Foundation for Research and Technology-Hellas. Her research interests are related to Web Data Management and more specifically the development of benchmarks for RDF engines, instance matching and link discovery systems, and the management of provenance for Linked Data.

Foreword 6
Preface 8
Organization of the Book 9
Target Audience 10
Acknowledgments 12
Contents 13
About the Authors 16
1 Introduction 18
1.1 Semantic Web 18
1.2 Linked Data 22
1.3 Book Roadmap 24
2 Fundamentals 26
2.1 Linked Data 26
2.2 RDF 29
2.3 SPARQL 33
2.4 OWL 36
2.5 Reasoning 38
2.6 OWL 2 Profiles 42
2.7 Modern Big Data Storage and Processing Systems 43
2.7.1 NoSQL Databases 43
2.7.2 MapReduce/Hadoop 45
2.7.3 Spark 47
3 Centralized RDF Query Processing 50
3.1 RDF Statement Table 50
3.2 Index Permutations for RDF Triples 53
3.3 Property Tables 57
3.4 Vertical Partitioning 59
3.5 Graph-Based Storage 61
3.6 Binary Encoding for RDF Databases 65
4 Distributed RDF Query Processing 67
4.1 NoSQL-Based RDF Systems 67
4.2 Hadoop-Based RDF Systems 71
4.3 Spark-Based RDF Systems 77
4.4 Main Memory-Based Distributed Systems 79
4.5 Other Distributed RDF Systems 82
4.6 Federated RDF Query Processing 90
5 Processing of RDF Stream Data 100
5.1 RDF Streaming Data in A Nutshell 100
5.2 Data Representation of RDF Streams 103
5.3 RDF Streaming Query Model 105
5.3.1 Stream-to-Stream Operator 106
5.3.2 Stream-to-Relation Operator 106
5.3.3 Relation-to-Relation Operator 107
5.4 RDF Streaming Query Languages and Syntax 109
5.5 System Design and Implementation 111
5.5.1 Design 111
5.5.2 Implementation Aspects 113
5.5.2.1 Time Management 113
5.5.2.2 Scheduling and Handling Memory 115
5.5.3 Systems 116
5.5.3.1 Streaming SPARQL 117
5.5.3.2 C-SPARQL 118
5.5.3.3 EP-SPARQL 120
5.5.3.4 SPARQLstream 121
5.5.3.5 CQELS 121
6 Distributed Reasoning of RDF Data 124
6.1 The Process of RDF Reasoning 124
6.2 Peer-to-Peer RDF Reasoning Systems 127
6.3 NoSQL-Based RDF Reasoning Systems 131
6.4 Hadoop-Based RDF Reasoning Systems 132
6.5 Spark-Based RDF Reasoning Systems 135
6.6 Shared Memory RDF Reasoning Systems 137
6.7 Influence on Other Semantic Web Languages 139
7 Benchmarking RDF Query Engines and Instance Matching Systems 142
7.1 Benchmark Definition and Principles 142
7.1.1 Overview 142
7.1.2 Benchmark Development Methodology 144
7.1.3 Choke Points 145
7.2 Benchmarks for RDF Query Engines 147
7.2.1 Real Benchmarks 148
7.2.1.1 UniProt 148
7.2.1.2 YAGO (Yet Another Great Ontology) 149
7.2.1.3 Barton Library 149
7.2.2 Synthetic RDF Benchmarks 152
7.2.2.1 Lehigh University Benchmark (LUBM) 152
7.2.2.2 SP2Bench 154
7.2.2.3 Berlin SPARQL Benchmark (BSBM) 156
7.2.2.4 Semantic Publishing Benchmark (SPB) 161
7.2.3 Benchmark Generators 167
7.2.3.1 DBPedia SPARQL Benchmark (DBSB) 167
7.2.3.2 Waterloo SPARQL Diversity Test Suite 169
7.2.3.3 FEASIBLE 171
7.2.4 Dataset Structuredness 172
7.3 Benchmarks for Instance Matching Systems 174
7.3.1 Datasets 176
7.3.2 Variations 176
7.3.3 Reference Alignment 177
7.3.4 Key Performance Indicators 178
7.3.5 Real Benchmarks 178
7.3.5.1 A-R-S 2009 178
7.3.5.2 Data Interlinking (DI) 2010 180
7.3.5.3 Data Interlinking (DI) 2011 181
7.3.5.4 Overall Evaluation of Real Benchmarks 181
7.3.6 Synthetic Benchmarks for Instance Matching Systems 182
7.3.6.1 IIMB 2009 182
7.3.6.2 IIMB 2010 184
7.3.6.3 Person-Restaurants (PR) 2010 187
7.3.6.4 IIMB 2011 188
7.3.6.5 Sandbox 2012 188
7.3.6.6 IIMB 2012 189
7.3.6.7 RDFT 2013 189
7.3.6.8 ID-REC 2014 190
7.3.6.9 SPIMBench 2015 190
7.3.6.10 ONTOlogy Matching Benchmark with Many Instances (ONTOBI) 191
7.3.7 Overall Evaluation of Synthetic Benchmarks 192
7.4 Instance Matching Benchmark Generators for Linked Data 192
7.4.1 SWING 192
7.4.2 SPIMBENCH 193
7.4.3 LANCE 194
8 Provenance Management for Linked Data 195
8.1 An Overview of Provenance Models 195
8.2 Provenance Representations 197
8.3 Provenance Models 198
8.3.1 Relational Provenance 198
8.3.2 RDF Provenance 199
8.3.3 Update Provenance 202
8.4 Provenance in Data Management Systems 204
9 Conclusions and Outlook 210
9.1 Conclusions 210
9.2 Outlook 213
References 216

Erscheint lt. Verlag	1.3.2018
Zusatzinfo	XX, 223 p. 70 illus., 7 illus. in color.
Verlagsort	Cham
Sprache	englisch
Themenwelt	Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik
Schlagworte	database design and models • data provenance • information integration • OWL • Query Languages • RDF • semantic web description languages • Semi-structured Data • SPARQL
ISBN-10	3-319-73515-2 / 3319735152
ISBN-13	978-3-319-73515-3 / 9783319735153

Haben Sie eine Frage zum Produkt?

PDF (Wasserzeichen)
Größe: 7,6 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Print-Ausgabe

Buch | Hardcover

171,19 €