Practical Hive - Scott Shaw, Andreas François Vermeulen, Ankur Gupta, David Kjerrumgaard

Blick ins Buch

Practical Hive (eBook)

A Guide to Hadoop's Data Warehouse System

Scott Shaw, Andreas François Vermeulen, Ankur Gupta, David Kjerrumgaard (Autoren)

eBook Download: PDF

2016 | 1st ed.
XXI, 265 Seiten
Apress (Verlag)
978-1-4842-0271-5 (ISBN)

Lese- und Medienproben

Ebook-Leseprobe (PDF)

In addition, this book discusses the value of open source software, Hive performance tuning, and how to leverage semi-structured and unstructured data.

What You Will Learn

Install and configure Hive for new and existing datasets
Perform DDL operations
Execute efficient DML operations

Discover performance tuning tips and Hive best practices

Who This Book Is For

Developers, companies, and professionals who deal with large amounts of data and could use software that can efficiently manage large volumes of input. It is assumed that readers have the ability to work with SQL.

Andreas Francois Vermeulen is Consulting Manager of Business Intelligence, Big Data, Data Science, and Computational Analytics at Sopra-Steria, doctoral researcher at University of Dundee and St Andrews on future concepts in massive distributed computing, mechatronics, big data, business intelligence, and deep learning. He owns and incubates the 'Rapid Information Factory' data processing framework. Active in developing next-generation processing frameworks and mechatronics engineering with over thirty-five years of international experience in data processing, software development and system architecture. Andre is a data scientist, doctoral trainer, corporate consultant, principal systems architect, and speaker/author/columnist on data science, distributed computing, big data, business intelligence, and deep learning. Andre took his bachelor's at the North West University at Potchefstroom, his Master of Business Administration at the University of Manchester, Master of Business Intelligence and Data Science at University of Dundee, and Doctor of Philosophy at the University of Dundee and St Andrews.

Ankur Gupta is a Senior Solutions Engineer at Hortonworks. He has over fourteen years of experience in data management, working as a Data Architect and Oracle DBA. Before joining the world of big data, he was working as an Oracle Consultant for Investment Banks in the UK. He is a regular speaker on big data concepts, Hive, Hadoop, Oracle in various events and is an author of Oracle Goldengate 11g Complete Cookbook. Ankur has a Masters' degree in Computer Science & International Business. He is a Hadoop Certified Administrator & Oracle Certified Professional and lives in London with his wife.

David Kjerrumgaard is a systems architect at Hortonworks. He has 20 years of experience in software development and is a Certified Developer for Apache Hadoop (CCDH). Kjerrumgaard is the author of Data Governance with Apache Falcon and Cloudera Developer Training for Apache Hadoop. He took his BS and MS in Computer Science from Kent State University.

Dive into the world of SQL on Hadoop and get the most out of your Hive data warehouses. This book is your go-to resource for using Hive: authors Scott Shaw, Ankur Gupta, David Kjerrumgaard, and Andreas Francois Vermeulen take you through learning HiveQL, the SQL-like language specific to Hive, to analyze, export, and massage the data stored across your Hadoop environment. From deploying Hive on your hardware or virtual machine and setting up its initial configuration to learning how Hive interacts with Hadoop, MapReduce, Tez and other big data technologies, Practical Hive gives you a detailed treatment of the software.In addition, this book discusses the value of open source software, Hive performance tuning, and how to leverage semi-structured and unstructured data. What You Will LearnInstall and configure Hive for new and existing datasetsPerform DDL operationsExecute efficient DML operationsUse tables, partitions, buckets, and user-defined functionsDiscover performance tuning tips and Hive best practicesWho This Book Is ForDevelopers, companies, and professionals who deal with large amounts of data and could use software that can efficiently manage large volumes of input. It is assumed that readers have the ability to work with SQL.

Scott Shaw has over fifteen years of data management experience. He has worked as both an Oracle and SQL Server DBA. He has worked as a consultant on Microsoft business intelligence projects utilizing both Tabular and OLAP models and co-authored two T-SQL books by Apress. Scott also enjoys speaking across the country about distributed computing, Big Data concepts, business intelligence, Hive, and the value of Hadoop. Scott works as a Sr. Solutions Engineer for Hortonworks and lives in Saint Louis with his wife and two kids.Andreas Francois Vermeulen is Consulting Manager of Business Intelligence, Big Data, Data Science, and Computational Analytics at Sopra-Steria, doctoral researcher at University of Dundee and St Andrews on future concepts in massive distributed computing, mechatronics, big data, business intelligence, and deep learning. He owns and incubates the "Rapid Information Factory" data processing framework. Active in developing next-generation processing frameworks and mechatronics engineering with over thirty-five years of international experience in data processing, software development and system architecture. Andre is a data scientist, doctoral trainer, corporate consultant, principal systems architect, and speaker/author/columnist on data science, distributed computing, big data, business intelligence, and deep learning. Andre took his bachelor's at the North West University at Potchefstroom, his Master of Business Administration at the University of Manchester, Master of Business Intelligence and Data Science at University of Dundee, and Doctor of Philosophy at the University of Dundee and St Andrews. Ankur Gupta is a Senior Solutions Engineer at Hortonworks. He has over fourteen years of experience in data management, working as a Data Architect and Oracle DBA. Before joining the world of big data, he was working as an Oracle Consultant for Investment Banks in the UK. He is a regular speaker on big data concepts, Hive, Hadoop, Oracle in various events and is an author of Oracle Goldengate 11g Complete Cookbook. Ankur has a Masters’ degree in Computer Science & International Business. He is a Hadoop Certified Administrator & Oracle Certified Professional and lives in London with his wife.David Kjerrumgaard is a systems architect at Hortonworks. He has 20 years of experience in software development and is a Certified Developer for Apache Hadoop (CCDH). Kjerrumgaard is the author of Data Governance with Apache Falcon and Cloudera Developer Training for Apache Hadoop. He took his BS and MS in Computer Science from Kent State University.

Chapter 1: Setting the Stage for Hive: Hadoop.- Chapter 2: Introducing Hive.- Chapter 3: Hive Architecture.- Chapter 4: Hive Tables DDL.- Chapter 5: Data Manipulation Language (DML).- Chapter 6: Loading Data into Hive.- Chapter 7: Querying Semi-Structured Data.- Chapter 8: Hive Analytics.- Chapter 9: Performance Tuning: Hive.- Chapter 10: Hive Security.- Chapter 11: Future of Hive.- Chapter 12: Appendix A. Building a Big Data Team.- Chapter 13: Appendix B. Hive Functions.

Erscheint lt. Verlag	27.8.2016
Zusatzinfo	XXI, 265 p. 85 illus., 73 illus. in color.
Verlagsort	Berkeley
Sprache	englisch
Themenwelt	Informatik ► Datenbanken ► Data Warehouse / Data Mining
Themenwelt	Informatik ► Netzwerke ► Sicherheit / Firewall
Schlagworte	Atlas integration • Avro • data structures • DDL • Flume • Hadoop • Hcatalog • HiveQL • Hive streaming • MapReduce • orc • Ranger integration • RDBMS • Semi-structured Data • sentiment analysis • Sqoop • Yarn
ISBN-10	1-4842-0271-6 / 1484202716
ISBN-13	978-1-4842-0271-5 / 9781484202715

Haben Sie eine Frage zum Produkt?

PDF (Wasserzeichen)
Größe: 9,4 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Print-Ausgabe

Buch | Softcover

64,19 €