Apache Oozie - Mohammad Kamrul Islam, Aravind Srinivasan

Apache Oozie

The Workflow Scheduler for Hadoop
Buch | Softcover
272 Seiten
2015
O'Reilly Media (Verlag)
978-1-4493-6992-7 (ISBN)
35,90 inkl. MwSt
Get a solid grounding in Apache Oozie, the workflow scheduler system for managing Hadoop jobs. With this hands-on guide, two experienced Hadoop practitioners walk you through the intricacies of this powerful and flexible platform, with numerous examples and real-world use cases.

Once you set up your Oozie server, you’ll dive into techniques for writing and coordinating workflows, and learn how to write complex data pipelines. Advanced topics show you how to handle shared libraries in Oozie, as well as how to implement and manage Oozie’s security capabilities.

Topics included:
  • Install and configure an Oozie server, and get an overview of basic concepts
  • Journey through the world of writing and configuring workflows
  • Learn how the Oozie coordinator schedules and executes workflows based on triggers
  • Understand how Oozie manages data dependencies
  • Use Oozie bundles to package several coordinator apps into a data pipeline
  • Learn about security features and shared library management
  • Implement custom extensions and write your own EL functions and actions
  • Debug workflows and manage Oozie’s operational details

Mohammad Kamrul Islam is currently working at Uber in data engineering team as a Staff Software Engineer. Previously, he worked at Linkedin for more than two years as Staff Software Engineer in the Hadoop development team. Before that, he worked at Yahoo for nearly five years as an Oozie architect/technical lead. His fingerprints can befound all over Oozie and is a respected voice in the Oozie community. He has been intimately involved with the Apache Hadoop ecosystem since 2009. Mohammad has a Ph.D. in Computer Science with a specialization in parallel job scheduling from Ohio State University. He received his MSCS degree from Wright State University, Ohio andBSCS from Bangladesh University of Engineering and Technology (BUET). He is a Project Management Committee (PMC) member of both Apache Oozie and Apache TEZ and frequently contributes to Apache YARN/MapReduce and Apache Hive. He was elected as the PMC chair and Vice-President of Oozie as part of the Apache Software Foundation from 2013 through 2015.

Aravind Srinivasan has been involved with Hadoop in general and Oozie in particular since 2008. He is currently a Lead Application Architect at Altiscale, a Hadoop-as-a-service company, where he helps customers with Hadoop application design and architecture. His association with Big Data and Hadoop started during his time at Yahoo, where he spent almost six years working on various data pipelines for advertising systems. He has extensive experience building complicated, low latency data pipelines and also in porting legacy pipelines to Oozie. He drove a lot of Oozie’s requirements as a customer in its early days of adoption inside Yahoo and later spent some time as a Product Manager in Yahoo’s Hadoop team where he contributed further to Oozie’s roadmap. He also spent a year after Yahoo at Think Big Analytics, a Hadoop consulting firm, where he got to consult on some interesting and challenging Big Data integration projects at Facebook. He has a Masters in Computer Science from Arizona State and lives in Silicon Valley.

Chapter 1Introduction to Oozie
Big Data Processing
Chapter 2Oozie Concepts
Oozie Applications
Parameters, Variables, and Functions
Application Deployment Model
Oozie Architecture
Chapter 3Setting Up Oozie
Oozie Deployment
Basic Installations
Advanced Oozie Installations
Chapter 4Oozie Workflow Actions
Workflow
Actions
Action Types
Synchronous Versus Asynchronous Actions
Chapter 5Workflow Applications
Outline of a Basic Workflow
Control Nodes
Job Configuration
Parameterization
The job.properties File
Configuration and Parameterization Examples
Lifecycle of a Workflow
Chapter 6Oozie Coordinator
Coordinator Concept
Triggering Mechanism
Coordinator Application and Job
Coordinator Job Lifecycle
Coordinator Action Lifecycle
Parameterization of the Coordinator
Execution Controls
An Improved Coordinator
Chapter 7Data Trigger Coordinator
Expressing Data Dependency
Example: Rollup
Parameterization of Dataset Instances
Parameter Passing to Workflow
A Complete Coordinator Application
Chapter 8Oozie Bundles
Bundle Basics
Bundle Specification
Bundle State Transitions
Chapter 9Advanced Topics
Managing Libraries in Oozie
Oozie Security
Supporting New API in MapReduce Action
Supporting Uber JAR
Cron Scheduling
Emulate Asynchronous Data Processing
HCatalog-Based Data Dependency
Chapter 10Developer Topics
Developing Custom EL Functions
Supporting Custom Action Types
Overriding an Asynchronous Action Type
Creating a New Asynchronous Action
Chapter 11Oozie Operations
Oozie CLI Tool
Oozie REST API
Oozie Java Client
The oozie-site.xml File
The Oozie Purge Service
Job Monitoring
Oozie Instrumentation and Metrics
Reprocessing
Server Tuning
Oozie High Availability
Debugging in Oozie
MiniOozie and LocalOozie
The Competition

Erscheint lt. Verlag 23.6.2015
Zusatzinfo black & white illustrations
Verlagsort Sebastopol
Sprache englisch
Maße 178 x 233 mm
Gewicht 435 g
Einbandart kartoniert
Themenwelt Mathematik / Informatik Informatik Datenbanken
Mathematik / Informatik Informatik Software Entwicklung
Schlagworte Big Data • Hadoop • Hadoop Software Framework • JAVA (Programmiersprache)
ISBN-10 1-4493-6992-8 / 1449369928
ISBN-13 978-1-4493-6992-7 / 9781449369927
Zustand Neuware
Haben Sie eine Frage zum Produkt?
Wie bewerten Sie den Artikel?
Bitte geben Sie Ihre Bewertung ein:
Bitte geben Sie Daten ein:
Mehr entdecken
aus dem Bereich
Einführung in die Praxis der Datenbankentwicklung für Ausbildung, …

von René Steiner

Buch | Softcover (2021)
Springer Fachmedien Wiesbaden GmbH (Verlag)
49,99