Big-Data Analytics and Cloud Computing (eBook)
XVI, 169 Seiten
Springer International Publishing (Verlag)
978-3-319-25313-8 (ISBN)
This book reviews the theoretical concepts, leading-edge techniques and practical tools involved in the latest multi-disciplinary approaches addressing the challenges of big data. Illuminating perspectives from both academia and industry are presented by an international selection of experts in big data science. Topics and features: describes the innovative advances in theoretical aspects of big data, predictive analytics and cloud-based architectures; examines the applications and implementations that utilize big data in cloud architectures; surveys the state of the art in architectural approaches to the provision of cloud-based big data analytics functions; identifies potential research directions and technologies to facilitate the realization of emerging business models through big data approaches; provides relevant theoretical frameworks, empirical research findings, and numerous case studies; discusses real-world applications of algorithms and techniques to address the challenges of big datasets.
The editors are all members of the Computing and Mathematics Department at the University of Derby, UK, where Dr. Marcello Trovati serves as a Senior Lecturer in Mathematics, Dr. Richard Hill as a Professor and Head of the Computing and Mathematics Department, Dr. Ashiq Anjum as a Professor of Distributed Computing, Dr. Shao Ying Zhu as a Senior Lecturer in Computing, and Dr. Lu Liu as a Professor of Distributed Computing. The other publications of the editors include the Springer titles Guide to Security Assurance for Cloud Computing, Guide to Cloud Computing and Cloud Computing for Enterprise Architectures.
The editors are all members of the Computing and Mathematics Department at the University of Derby, UK, where Dr. Marcello Trovati serves as a Senior Lecturer in Mathematics, Dr. Richard Hill as a Professor and Head of the Computing and Mathematics Department, Dr. Ashiq Anjum as a Professor of Distributed Computing, Dr. Shao Ying Zhu as a Senior Lecturer in Computing, and Dr. Lu Liu as a Professor of Distributed Computing. The other publications of the editors include the Springer titles Guide to Security Assurance for Cloud Computing, Guide to Cloud Computing and Cloud Computing for Enterprise Architectures.
Foreword 6
Preface 8
Overview and Goals 8
Organisation and Features 9
Target Audiences 9
Suggested Uses 9
Acknowledgements 12
Contents 14
Contributors 16
Part I Theory 18
1 Data Quality Monitoring of Cloud Databases Based on Data Quality SLAs 19
1.1 Introduction and Summary 19
1.2 Background 21
1.2.1 Data Quality in the Context of Big Data 21
1.2.2 Cloud Computing 22
1.2.3 Data Quality Monitoring in the Cloud 24
1.2.4 The Challenge of Specifying a DQSLA 24
1.2.5 The Infrastructure Estimation Problem 25
1.3 Proposed Solutions 26
1.3.1 Data Quality SLA Formalization 26
1.3.2 Examples of Data Quality SLAs 27
1.3.3 Data Quality-Aware Service Architecture 29
1.4 Future Research Directions 31
1.5 Conclusions 35
References 35
2 Role and Importance of Semantic Search in Big Data Governance 37
2.1 Introduction 37
2.2 Big Data: Promises and Challenges 38
2.3 Participatory Design for Big Data 39
2.4 Self-Service Discovery 42
2.5 Conclusion 49
References 51
3 Multimedia Big Data: Content Analysis and Retrieval 52
3.1 Introduction 52
3.2 The MapReduce Framework and Multimedia Big Data 54
3.2.1 Indexing 55
3.2.2 Caveats on Indexing 57
3.2.3 Multiple Multimedia Processing 57
3.2.4 Additional Work Required? 59
3.3 Deep Learning and Multimedia Data 60
3.4 Conclusions 64
References 64
4 An Overview of Some Theoretical Topological Aspects of Big Data 67
4.1 Introduction 67
4.2 Representation of Data 68
4.3 Homology Theory 70
4.3.1 Simplicial Complexes 70
4.3.2 Voronoi Diagrams and Delaunay Triangulations 72
4.3.3 Vietoris and ?ech Complexes 73
4.3.4 Graph-Induced Complexes 74
4.3.5 Chains 74
4.4 Network Theory for Big Data 75
4.4.1 Scale-Free, Small-World and Random Networks 75
4.5 Conclusions 78
References 78
Part II Applications 79
5 Integrating Twitter Traffic Information with Kalman Filter Models for Public Transportation Vehicle Arrival Time Prediction 80
5.1 Introduction 80
5.2 Communication Platform on Twitter 82
5.3 Communication for Data Collection on Twitter 82
5.4 Event Detection and Analysis: Tweets Relating to Road Incidents 83
5.4.1 Twitter Data: Incident Data Set 84
5.5 Methodology 87
5.5.1 Time Series and Temporal Analysis of Textual Twitter 87
5.6 Proposed Refined Kalman Filter (KF) Model-Based System 91
5.7 Conclusion 94
References 94
6 Data Science and Big Data Analytics at Career Builder 96
6.1 Carotene: A Job Title Classification System 96
6.1.1 Occupation Taxonomies 98
6.1.2 The Architecture of Carotene 99
6.1.2.1 Taxonomy Discovery Using Clustering 100
6.1.2.2 Coarse-Level Classification: SOC Major Classifier 101
6.1.2.3 Fine-Level Classification: Proximity-Based Classifier 101
6.1.3 Experimental Results and Discussion 102
6.2 CARBi: A Data Science Ecosystem 103
6.2.1 Accessing CB Data and Services Using WebScalding 103
6.2.2 ScriptDB: Managing Hadoop Jobs 106
References 108
7 Extraction of Bayesian Networks from Large Unstructured Datasets 110
7.1 Introduction 110
7.2 Text Mining 111
7.2.1 Text Mining Techniques 112
7.2.2 General Architecture and Various Components of Text Mining 113
7.2.3 Lexical Analysis 113
7.2.4 Part-of-Speech Tagging 114
7.2.5 Parsing 114
7.2.6 Named Entity Recognition 115
7.2.7 Named Entity Recognition 115
7.2.8 Concept Extraction 115
7.2.9 Sentiment Analysis 116
7.3 The Automatic Extraction of Bayesian Networks from Text 116
7.3.1 Dependence Relation Extraction from Text 117
7.3.2 Variable Identification 118
7.3.3 BN Structure Definition 118
7.3.4 Probability Information Extraction 118
7.3.5 Probability Information Extraction 119
7.3.6 General Architecture 120
7.4 Conclusions 121
References 121
8 Two Case Studies Based on Large Unstructured Sets 123
8.1 Introduction 123
8.2 Case Study 1: Computational Objectivity in the PHQ-9 Depression Assessment 124
8.2.1 Reliability and Validity Issues of the PHQ-9 124
8.2.2 Analytic Hierarchy Process: Defining a Weighting System 126
8.2.2.1 PHQ-9 Analysis via the Analytic Hierarchy Process 127
8.2.2.2 Advantages of AHP 127
8.2.3 A Text Mining Approach 127
8.3 Case Study 2: Evaluation of Probabilistic Information Extraction from Large Unstructured Datasets 130
8.3.1 Description of the Method 131
8.3.1.1 Description of Text and Data Patterns 131
8.3.2 Network Extraction Method 132
8.3.3 Description of Datasets 132
8.3.4 Evaluation 133
8.4 Conclusion 136
References 136
9 Information Extraction from Unstructured Data Sets: An Application to Cardiac Arrhythmia Detection 138
9.1 Introduction 138
9.2 Background 139
9.3 Automated Extraction of Fuzzy Partition Rules from Text 140
9.3.1 Text Mining Extraction Results 142
9.4 Data Preparation 142
9.4.1 Feature Selection 144
9.5 Fuzzy Partition Design 144
9.5.1 Criteria for the Evaluation of Fuzzy Partitions 147
9.6 Rule Base Generation 151
9.6.1 Knowledge Base Accuracy 151
9.7 Evaluation 152
9.8 Conclusion 154
References 154
10 A Platform for Analytics on Social Networks Derived from Organisational Calendar Data 157
10.1 Introduction 157
10.2 Literature Review/Related Work 158
10.2.1 Social Capital and the Exchange of Knowledge/Resources 158
10.2.2 Social Capital and the Exchange of Knowledge/Resources 159
10.2.3 Repurposing Redundant Organisational Data 160
10.2.4 Graph Databases 160
10.3 Proposed Platform 161
10.3.1 Capture: Capturing the Calendar Data 161
10.3.2 Process: Processing the Captured Data into Social Data 162
10.3.3 Build: Building the Social Network 164
10.3.4 Visualise: Visualising the Social Network Structure 166
10.3.5 Analyse: Performing Analysis Against the Social Network 167
10.3.6 Experimental Setup 167
10.3.7 Solution Setup 168
10.3.8 Hardware Setup 170
10.4 Results 170
10.4.1 Outlier Detection 170
10.4.2 Detection of Outlying Groups 170
10.4.3 Identification of Key Communicators for Specific Groups and Highly Connected Individuals 171
10.4.4 Frequency of Interaction 172
10.4.5 Experiment Data Statistics 172
10.5 Conclusions 174
References 174
Index 176
Erscheint lt. Verlag | 12.1.2016 |
---|---|
Zusatzinfo | XVI, 169 p. 67 illus. in color. |
Verlagsort | Cham |
Sprache | englisch |
Themenwelt | Mathematik / Informatik ► Informatik |
Mathematik / Informatik ► Mathematik ► Wahrscheinlichkeit / Kombinatorik | |
Schlagworte | Analytics • Big Data • Cloud Computing • Distributed Systems • Simulation and modeling |
ISBN-10 | 3-319-25313-1 / 3319253131 |
ISBN-13 | 978-3-319-25313-8 / 9783319253138 |
Haben Sie eine Frage zum Produkt? |
Größe: 4,5 MB
DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.
Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich