Data Engineering with AWS Cookbook

A recipe-based approach to helping you tackle data engineering problems with AWS services

Tram Pham, Gonzalo Herreros González, Vaquar Khan, Huda Nofal (Autoren)

Buch | Softcover

399 Seiten

2024
Packt Publishing Limited (Verlag)
978-1-80512-728-4 (ISBN)

Artikel merken

Master AWS data engineering services and techniques for orchestrating pipelines, building layers, and managing migrations

Key Features

Get up to speed with the different AWS technologies for data engineering
Learn the different aspects and considerations of building data lakes, such as security, storage, and operations
Get hands on with key AWS services like Glue, EMR, Redshift, Athena, and QuickSight for practical learning
Purchase of the print or Kindle book includes a free PDF eBook

Book DescriptionPerforming data engineering with Amazon Web Services combines AWS's scalable infrastructure with robust data processing tools, enabling efficient data pipelines and analytics workflows. This comprehensive guide to AWS data engineering will teach you all you need to know about data lake management, pipeline orchestration, and serving layer construction.
Through clear explanations and hands-on exercises, you’ll master essential AWS services like Glue, EMR, Redshift, Athena, and QuickSight. Additionally, you’ll explore data governance, DevOps, CI/CD, and Infrastructure as Code. As you progress, you’ll also gain insights into Tableau Server and Cloud.
By the end of this book, you’ll be well-versed in AWS data engineering and have gained proficiency in key AWS services, mastered data processing techniques, and developed the skills necessary to tackle large-scale data challenges with confidence.What you will learn

Define your centralized data lake solution, and secure and operate it at scale
Identify the most suitable AWS solution for your specific needs
Build data pipelines using multiple ETL technologies
Discover how to handle data orchestration and governance
Explore how to build a high-performing data serving layer
Delve into DevOps and data quality best practices
Migrate your data from on-premise to AWS

Who this book is forIf you're involved in designing, building, or overseeing data solutions on AWS, this book provides proven strategies for addressing challenges in large-scale data environments. Data engineers as well as big data professionals looking to enhance their understanding of AWS features for optimizing their workflow, even if they're new to the platform, will find value. Basic familiarity with AWS security (users and roles) and command shell is recommended.

Trâm Phạm is a Senior Data and Analytics Consultant with 12 years of experience in the data and analytics field. As a professional services consultant, she helps large-scale enterprises in Vietnam build and launch their data platforms. Having a background in data analytics, business intelligence (BI), and project management, and three years of experience as a big data engineer utilizing Spark and Cloud technologies has allowed her to bring a unique skill set to her role as an AWS Data and Analytics Consultant. Gonzalo Herreros González is a seasoned Senior Big Data Architect at Amazon Web Services (AWS) - Glue, with over two decades of experience in architecting, integrating, and developing data solutions both on-premises and in the cloud. Specializing in Big Data and Apache Spark technologies, he holds a Bachelor's degree in Computer Science and a Master's degree in Data Analytics. Gonzalo has successfully delivered solutions for large multinationals and innovative startups, with notable projects including the development of AWS Glue Studio dynamic transforms, achieving PCI-DSS certification for MasterCard's Hadoop cluster, and creating the Corvil Intelligence Hub for stock market data analysis. Currently, he contributes his expertise to the AWS Glue service team, advancing data architecture and analytics within the IT industry. Vaquar Khan is an accomplished Technology Architect and Cloud Architect with over 19 years of IT experience, specializing in large-scale distributed systems, cloud, and Big Data architecture for competitive clients in the BFSI domain. A polyglot developer skilled in Java, Python, and Scala, he excels in designing and implementing innovative solutions, leading by example. Vaquar has extensive hands-on experience in developing distributed systems, multi-tenant cloud-based solutions, and microservices. He is a certified AWS expert with solid experience in GCP, Azure, and Pivotal Cloud Foundry. His expertise includes full-stack development, CI/CD, Docker, Kubernetes, and development process automation. Vaquar has contributed to open-source projects such as Apache Spark and JSR 368, and has proven experience with technologies such as Hadoop, Hive, and Kafka, as well as cloud platforms like AWS EMR, EC2, S3, Cognito, Lambda, Docker, and Kubernetes. His innovative approach ensures business continuity, information security, and data engineering excellence. You can explore more about his work on his GitHub, Stack Overflow, and his blog. Huda Nofal is a Data Engineer with a strong background in the internet industry. She is skilled in Data Warehousing, Python programming, and Extract, Transform, Load (ETL) processes. Huda also has a keen interest in Deep Learning and Machine Learning. She holds a Bachelor's degree in Computer and Information Systems from the University of Jordan.

Table of Contents

Managing Data Lake Storage
Sharing Your Data Across Environments and Accounts
Ingesting and Transforming Your Data with AWS Glue
A Deep Dive into AWS Orchestration Frameworks
Running Big Data Workloads with Amazon EMR
Governing Your Platform
Data Quality Management
DevOps – Defining IaC and Building CI/CD Pipelines
Monitoring Data Lake Cloud Infrastructure
Buiding the serving layer on AWS Redshift, Athena and Quicksight
On-premise Platform Migration to AWS
Security and governance with Google Cloud BigLake

Erscheinungsdatum	28.08.2024
Verlagsort	Birmingham
Sprache	englisch
Maße	191 x 235 mm
Themenwelt	Mathematik / Informatik ► Informatik ► Datenbanken
	Informatik ► Software Entwicklung ► User Interfaces (HCI)
	Mathematik / Informatik ► Informatik ► Theorie / Studium
ISBN-10	1-80512-728-4 / 1805127284
ISBN-13	978-1-80512-728-4 / 9781805127284
Zustand	Neuware