Practical Data Science

A Guide to Building the Technology Stack for Turning Data Lakes into Business Assets

Andreas Francois Vermeulen (Autor)

Buch | Softcover

805 Seiten

2018
Apress (Verlag)
978-1-4842-3053-4 (ISBN)

Artikel merken

Provides the essential concepts and terminology to gain fluency in data science and data engineering
Walks through the steps of building a technology stack on a layered framework to retrieve actionable business knowledge
Teaches how to synthesize the polyglot data types in a data lake with repeatable results

Learn how to build a data science technology stack and perform good data science with repeatable methods. You will learn how to turn data lakes into business assets.

The data science technology stack demonstrated in Practical Data Science is built from components in general use in the industry. Data scientist Andreas Vermeulen demonstrates in detail how to build and provision a technology stack to yield repeatable results.

He shows you how to apply practical methods to extract actionable business knowledge from data lakes consisting of data from a polyglot of data types and dimensions.

Become fluent in the essential concepts and terminology of data science and data engineering
Build and use a technology stack that meets industry criteria
Master the methods for retrieving actionable business knowledge
Coordinate the handling of polyglot data types in a data lake for repeatable results

This book is for data scientists and data engineers who are required to convert data from a data lake into actionable knowledge for their business, and students who aspire to be data scientists and data engineers.

Andreas Francois Vermeulen is Consulting Manager - Business Intelligence, Big Data, Data Science, Machine Learning, and Computational Analytics at Sopra-Steria, and a doctoral researcher at University St. Andrews on future concepts in massive distributed computing, mechatronics, big data, business intelligence, and deep learning. He owns and incubates the "Rapid Information Factory" data processing framework. He is active in developing next-generation processing frameworks and mechatronics engineering with over 35 years of international experience in data processing, software development, and system architecture. Andre is a data scientist, doctoral trainer, corporate consultant, principal systems architect, and speaker/author/columnist on data science, distributed computing, big data, business intelligence, deep learning, and constraint programming. Andre received his bachelor degree at the North West University at Potchefstroom, his Master of Business Administration at University of Manchester, Master of Business Intelligence and Data Science degree at University of Dundee, and Doctor of Philosophy at University of St Andrews.

PART
I Building Blocks Chapter
1 Introduction to Data Science o Data Science o Data Lakes o Extracting Actionable Business Knowledge from Data Lakes Chapter
2 The Building Blocks of the Data Science Technology Stack o Spark o Mesos o Akka o Cassandra o Kafka o Elasticsearch o R o Data Vault o Data Warehouse Bus Matrix o MQTT PART
II Layers Chapter
3 A Layered Framework of Practical Methods for Performing Good Data Science o The Top Layers of a Layered Framework o Engineering a Layered Framework for High-Level Data Science Chapter
4 The Business Layer Chapter
7 The Audit, Balance, and Control Layer Chapter
8 The Functional Layer and Its Super Steps PART
III The Super Steps of the Functional Layer Chapter
9 The Retrieve Super Step o Ingress Layers o Data Lakes and the Internet of Things o Deploying the Retrieve Super Step in a Technology Stack Chapter
10 The Assess Super Step o Data Quality o Data Scrubbing o Deploying the Assess Super Step in a Technology Stack Chapter
11 The Process Super Step o Data Vault and Time-Person-Object-Location-Event Hubs o Machine Learning o Deploying the Process Super Step in a Technology Stack Chapter
12 The Transform Super Step o Data Warehouse Bus Matrix o Statistics o Graph Database o Deploying the Tranform Super Step in a Technology Stack Chapter
13 The Organize Super Step of the Functional Layer o Data Mart o Deploying the Organize Super Step in a Technology Stack Chapter
14 The Report Super Step of the Functional Layer o Virtualization o Deploying the Report Super Step in a Technology Stack PART
IV Clusters and Grids Chapter
15 Building Cluster-Grid Appliances for Polyglot Data Science o o Grids o Torus Networks o Mesosphere Micro Services o Cloud Computing o Fog Computing o Bare Metal Solutions Chapter
16 The Future of Data Science and Data Engineering

Erscheinungsdatum	17.03.2018
Zusatzinfo	9 Illustrations, color; 48 Illustrations, black and white
Verlagsort	Berkley
Sprache	englisch
Maße	178 x 254 mm
Gewicht	1540 g
Einbandart	kartoniert
Themenwelt	Informatik ► Datenbanken ► Data Warehouse / Data Mining
	Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik
	Mathematik / Informatik ► Mathematik ► Finanz- / Wirtschaftsmathematik
Schlagworte	actionable business knowledge • data engineering • Data Lake • Data Science • data science technology stack • data scrubbing techniques • data vault and data mart • data warehouse bus matrix • Fog Computing • graph database • grids and clusters • IoT and embedded systems • machine learning • Machine-to-machine • MQTT • polyglot data science • Spark, Mesos, Akka, Cassandra, Kafka, Elasticsearch, R • super steps of the functional layer • torus network
ISBN-10	1-4842-3053-1 / 1484230531
ISBN-13	978-1-4842-3053-4 / 9781484230534
Zustand	Neuware