Practical Hadoop Ecosystem - Deepak Vohra

Practical Hadoop Ecosystem (eBook)

A Definitive Guide to Hadoop-Related Frameworks and Tools

Deepak Vohra (Autor)

eBook Download: PDF

2016 | 1st ed.
XX, 421 Seiten
Apress (Verlag)
978-1-4842-2199-0 (ISBN)

This book is a practical guide on using the Apache Hadoop projects including MapReduce, HDFS, Apache Hive, Apache HBase, Apache Kafka, Apache Mahout and Apache Solr. From setting up the environment to running sample applications each chapter is a practical tutorial on using a Apache Hadoop ecosystem project. While several books on Apache Hadoop are available, most are based on the main projects MapReduce and HDFS and none discusses the other Apache Hadoop ecosystem projects and how these all work together as a cohesive big data development platform.

What you'll learn

How to set up environment in Linux for Hadoop projects using Cloudera Hadoop Distribution CDH 5.
How to run a MapReduce job
How to store data with Apache Hive, Apache HBase
How to index data in HDFS with Apache Solr
How to develop a Kafka messaging system
How to develop a Mahout User Recommender System
How to stream Logs to HDFS with Apache Flume
How to transfer data from MySQL database to Hive, HDFS and HBase with Sqoop
How create a Hive table over Apache Solr

Who this book is for:

The primary audience is Apache Hadoop developers. Pre-requisite knowledge of Linux and some knowledge of Hadoop is required.

Deepak Vohra is a coder, developer, programmer, book author and technical reviewer.

Learn how to use the Apache Hadoop projects, including MapReduce, HDFS, Apache Hive, Apache HBase, Apache Kafka, Apache Mahout, and Apache Solr. From setting up the environment to running sample applications each chapter in this book is a practical tutorial on using an Apache Hadoop ecosystem project.While several books on Apache Hadoop are available, most are based on the main projects, MapReduce and HDFS, and none discusses the other Apache Hadoop ecosystem projects and how they all work together as a cohesive big data development platform.What You Will Learn:Set up the environment in Linux for Hadoop projects using Cloudera Hadoop Distribution CDH 5Run a MapReduce jobStore data with Apache Hive, and Apache HBaseIndex data in HDFS with Apache SolrDevelop a Kafka messaging systemStream Logs to HDFS with Apache FlumeTransfer data from MySQL database to Hive, HDFS, and HBase with SqoopCreate a Hive table over Apache SolrDevelop a Mahout User Recommender SystemWho This Book Is For:Apache Hadoop developers. Pre-requisite knowledge of Linux and some knowledge of Hadoop is required.

Deepak Vohra is a coder, developer, programmer, book author, and technical reviewer.

Introduction1. HDFS and MapReduceHadoop Distributed FileSystemMapReduce FrameworksSetting the EnvironmentHadoop Cluster ModesRunning a MapReduce Job with MR1 FrameworkRunning MR1 in Standalone ModeRunning MR1 in Psuedo-Distributed ModeRunning MapReduce with Yarn FrameworkRunning YARN in Psuedo-Distributed ModeRunning Hadoop StreamingSection II Storing & Querying2. Apache HiveSetting the EnvironmentConfiguring HadoopConfiguring HiveStarting HDFSStarting the Hive ServerStarting the Hive CLICreating a DatabaseUsing a DatabaseCreating a Managed TableLoading Data into a TableCreating a table using LIKEAdding Data with INSERT INTO TABLEAdding Data with INSERT OVERWRITECreating Table using AS SELECTAltering a TableTruncating a TableDropping a TableCreating an External Table3. Apache HBaseSetting the EnvironmentConfiguring HadoopConfiguring HBaseConfiguring HiveStarting HBaseStarting HBase ShellCreating a HBase TableAdding Data To HBase TableListing All TablesGetting a Row of DataScanning a TableCounting Number of Rows in a TableAltering a TableDeleting a RowDeleting a ColumnDisabling and Enabling a TableTruncating a TableDropping a TableFinding if a Table existsCreating a Hive External TableSection III Bulk Transferring & Streaming4. Apache SqoopInstalling MySQL DatabaseCreating MySQL Database TablesSetting the EnvironmentConfiguring HadoopStarting HDFSConfiguring HiveConfiguring HBaseImporting into HDFSExporting from HDFSImporting into HiveImporting into HBase5. Apache FlumeSetting the EnvironmentConfiguring HadoopConfiguring HBaseStarting HDFSConfiguring FlumeRunning a Flume AgentConfiguring Flume for HBase SinkStreaming MySQL Log to HBase Sink Section IV Serializing 6. Apache AvroSetting the EnvironmentCreating an Avro SchemaCreating a Hive Managed TableCreating a Hive (version prior to 0.14) External Table Stored as AvroCreating a Hive (version 0.14 and later) External Table Stored as AvroTransferring MySQL Table Data as Avro Data File with Sqoop7. Apache Parquet Setting the Environment Creating a Oracle Database Table Exporting Oracle Database to a CSV File Importing the CSV File in MongoDB Exporting MongoDB Document as CSV File Importing a CSV File to Oracle DatabaseSection V Messaging & Indexing8. Apache KafkaSetting the EnvironmentStarting the Kafka ServerCreating a TopicStarting a Kafka ProducerStarting a Kafka ConsumerProducing and Consuming MessagesStreaming Log Data to Apache Kafka with Apache Flume Setting the Environment Creating Kafka Topics Configuring Flume Running Flume Agent Consuming Log Data as Kafka Messages9. Apache SolrSetting the EnvironmentConfiguring the Solr SchemaStarting the Solr Server Indexing a Document in SolrDeleting a Document from Solr Indexing a Document in Solr with Java ClientSearching a Document in SolrCreating a Hive Managed TableCreating a Hive External TableLoading Hive External Table DataSearching Hive Table Data Indexed in SolrSection VI Machine Learning 10.Apache MahoutSetting the EnvironmentStarting HDFSSetting the Mahout EnvironmentRunning a Mahout Classification SampleRunning a Mahout Clustering SampleDeveloping a User Based Recommender System The Sample Data Setting the Environment Creating a Maven Project in Eclipse Creating a User Based Recommender Creating a Recommender Evaluator Running the Recommender Choosing a Recommender Type Choosing a User Similarity Measure Choosing a Neighborhood Type Choosing a Neighborhood Size for NearestNUserNeighborhood Choosing a Threshold for ThresholdUserNeighborhood Running the Evaluator Choosing the Split between Training Percentage and Test Percentage

Erscheint lt. Verlag	30.9.2016
Zusatzinfo	XX, 421 p. 311 illus., 293 illus. in color.
Verlagsort	Berkeley
Sprache	englisch
Themenwelt	Mathematik / Informatik ► Informatik ► Datenbanken
Themenwelt	Mathematik / Informatik ► Informatik ► Netzwerke
Schlagworte	Apache • Apache Hadoop • Apache HBase • Big Data • Cloud • Database • Framework • Hadoop • HBase • Open Source
ISBN-10	1-4842-2199-0 / 1484221990
ISBN-13	978-1-4842-2199-0 / 9781484221990

Haben Sie eine Frage zum Produkt?

PDF (Wasserzeichen)
Größe: 26,1 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Print-Ausgabe

Buch | Softcover

53,49 €