Data Analytics in the AWS Cloud (eBook)

Building a Data Platform for BI and Predictive Analytics on AWS
eBook Download: EPUB
2023 | 1. Auflage
416 Seiten
Wiley (Verlag)
978-1-119-90925-5 (ISBN)

Lese- und Medienproben

Data Analytics in the AWS Cloud -  Joe Minichino
Systemvoraussetzungen
38,99 inkl. MwSt
  • Download sofort lieferbar
  • Zahlungsarten anzeigen

A comprehensive and accessible roadmap to performing data analytics in the AWS cloud

In Data Analytics in the AWS Cloud: Building a Data Platform for BI and Predictive Analytics on AWS, accomplished software engineer and data architect Joe Minichino delivers an expert blueprint to storing, processing, analyzing data on the Amazon Web Services cloud platform. In the book, you'll explore every relevant aspect of data analytics-from data engineering to analysis, business intelligence, DevOps, and MLOps-as you discover how to integrate machine learning predictions with analytics engines and visualization tools.

You'll also find:

  • Real-world use cases of AWS architectures that demystify the applications of data analytics
  • Accessible introductions to data acquisition, importation, storage, visualization, and reporting
  • Expert insights into serverless data engineering and how to use it to reduce overhead and costs, improve stability, and simplify maintenance

A can't-miss for data architects, analysts, engineers and technical professionals, Data Analytics in the AWS Cloud will also earn a place on the bookshelves of business leaders seeking a better understanding of data analytics on the AWS cloud platform.

GIONATA 'JOE' MINICHINO is Principal Software Engineer and Data Architect on the Data & Analytics Team at Teamwork. He specializes in cloud computing, machine/deep learning, and artificial intelligence and designs end-to-end Amazon Web Services pipelines that move large quantities of diverse data for analysis and visualization.


A comprehensive and accessible roadmap to performing data analytics in the AWS cloud In Data Analytics in the AWS Cloud: Building a Data Platform for BI and Predictive Analytics on AWS, accomplished software engineer and data architect Joe Minichino delivers an expert blueprint to storing, processing, analyzing data on the Amazon Web Services cloud platform. In the book, you ll explore every relevant aspect of data analytics from data engineering to analysis, business intelligence, DevOps, and MLOps as you discover how to integrate machine learning predictions with analytics engines and visualization tools. You ll also find: Real-world use cases of AWS architectures that demystify the applications of data analytics Accessible introductions to data acquisition, importation, storage, visualization, and reporting Expert insights into serverless data engineering and how to use it to reduce overhead and costs, improve stability, and simplify maintenanceA can't-miss for data architects, analysts, engineers and technical professionals, Data Analytics in the AWS Cloud will also earn a place on the bookshelves of business leaders seeking a better understanding of data analytics on the AWS cloud platform.

GIONATA "JOE" MINICHINO is Principal Software Engineer and Data Architect on the Data & Analytics Team at Teamwork. He specializes in cloud computing, machine/deep learning, and artificial intelligence and designs end-to-end Amazon Web Services pipelines that move large quantities of diverse data for analysis and visualization.

CHAPTER 2
The Path to Analytics: Setting Up a Data and Analytics Team


Creating analytics, especially in a large organization, can be a monumental effort, and a business needs to be prepared to invest time and resources, which will all repay the company manifold by enabling data‐driven decisions. The people who will make this shift toward data‐driven decision making are your Data and Analytics team, sometimes referred to as Data Analytics team or even simply as Data team (although this latest version tends to confuse people, as it may seem related to database administration). This book will refer to the Data and Analytics team as the DA team.

Although the focus of this book is architectural patterns and designs that will help you turn your organization into a data‐driven one, a high‐level overview of the skills and people you will need to make this happen is necessary.

The Data Vision


The first step in delivering analytics is to create a data vision, a statement for your business as a whole. This can be a simple quote that works as a compass for all the projects your DA team will work on.

A vision does not have to be immutable. However, you should only change it if it is somehow only applicable to certain conditions or periods of time and those conditions have been satisfied or that time has passed.

A vision is the North Star of your data journey. It should always be a factor when you're making decisions about what kind of work to carry out or how to prioritize a current backlog. An example of a data vision is “to create a unified analytics facility that enables business management to slice and dice data at will.”

Support


It's important to create the vision, and it's also vital for the vision to have the support of all the involved stakeholders. Management will be responsible for allocating resources to the DA team, so these managers need to be behind the vision and the team's ability to carry it out. You should have a vision statement ready and submit it to management, or have management create it in the first place.

I won't linger any further on this topic because this book is more of a technical nature than a business one, but be sure not to skip this vital step.

REDUCTIO AD ABSURDUM: HOW NOT TO GO ABOUT CREATING ANALYTICS


Before diving into the steps for creating analytics, allow me to give you some friendly advice on how you should not go about it. I will do so by recounting a fictional yet all too common story of failure by businesses and companies.

Data Undriven Inc. is a successful company with hundreds of employees, but it's in dire need of analytics to reverse some worrying revenue trends. The leadership team recognizes the need for a far more accurate kind of analytics than what they currently have available, since it appears the company is unable to pinpoint exactly what side of the business is hemorrhaging money. Gemma, a member of the leadership team, decides to start a project to create analytics for the company, which will find its ultimate manifestation in a dashboard illustrating all sorts of useful metrics. Gemma thinks Bob is a great Python/SQL data analyst and tasks Bob with the creation of reports. The ideas are good, but data for these reports resides in various data sources. This data is unsuitable for analysis because it is sparse and inaccurate, some integrity is broken, there are holes due to temporary system failures, and the DBA team has been hit with large and unsustainable queries run against their live transactional databases, which are meant to serve data to customers, not to be reported on.

Bob collects the data from all the sources and after weeks of wrangling, cleaning, filtering, and general massaging of the data, produces analytics to Gemma in the form of a spreadsheet with graphs in it.

Gemma is happy with the result, although she notices some incongruence with the expected figures. She asks Bob to automate this analysis into a dashboard that managers can consult and that will contain up‐to‐date information.

Bob is in a state of panic, looking up how to automate his analytics scripts, while also trying to understand why his numbers do not match Gemma's expectations—not to mention the fact that his Python program takes between 3 and 4 hours to run every time, so the development cycle is horrendously slow.

The following weeks are a harrowing story of misunderstandings, failed attempts at automations, frustration, degraded database performance, with the ultimate result that Gemma has no analytics and Bob has quit his job to join a DA team elsewhere.

What is the moral of the story? Do not put any analyst to work before you have a data engineer in place. This cannot be stated strongly enough. Resist the temptation to want analytics now . Go about it the right way. Set up a DA team, even if it's small and you suffer from resource constraints in the beginning, and let analysts come into the picture when the data is ready for analytics and not before. Let's see what kind of skills and roles you should rely on to create a successful DA team and achieve analytics even at scale.

DA Team Roles


There are two groups of roles for a DA team: the early stages and the mature stage. The definitions for these are not strict and vary from business to business. Make sure core roles are covered before advancing to more niche and specialized ones.

Early Stage Roles


By “early stage roles” we refer to a set of roles that will constitute the nucleus of your nascent DA team and that will help the team grow. At the very beginning, it is to be expected that the people involved will have to exercise some flexibility and open‐mindedness in terms of the scope and authority of their roles, because the priority is to build the foundation for a data platform. So a team lead will most likely be hands‐on, actively contributing to engineering, and the same can be said of the data architect, whereas data engineers will have to perform a lot of work in the realms of data platform engineering to enable the construction and monitoring of pipelines.

Team Lead

Your DA team should have, at least at the beginning, strong leadership in the form of a team lead. This is a person who is clearly technically proficient in the realm of analytics and is able to create tasks and delegate them to the right people, oversee the technical work that's being carried out, and act as a liaison between management and the DA team.

Analytics is a vast domain that has more business implications than other strictly technical areas (like feature development, for example), and yet the technical aspects can be incredibly challenging, normally requiring engineers with years of experience to carry out the work. For this reason, it is good to have a person spearheading the work in terms of workflow and methodology to avoid early‐stage fragmentation, discrepancies, and general disruption of the work due to lack of cohesion within the team. The team can potentially evolve into something more of a flat‐hierarchy unit later on, when every member is working with similar methods and practices that can be—at that later point—questioned and changed.

Data Architect

A data architect is a fundamental figure for a DA team and one the team cannot do without. Even if you don't elect someone to be officially recognized as the architect in the team, it is advisable to elect the most experienced and architecturally minded engineer to the role of supervisor of all the architectures designed and implemented by the DA team. Ideally the architect is a full‐time role, not only designing pipeline architectures but also completing work on the technology adoption front, which is a hefty and delicate task at the same time.

Deciding whether you should adopt a serverless architecture over an Airflow‐ or Hadoop‐based one is something that requires careful attention. Elements such as in‐house skills and maintenance costs are also involved in the decision‐making process.

The business can—especially under resource constraints—decide to combine the architect and team lead roles. I suggest making the data architect/team lead a full‐time role before the analytics demand volume in the company becomes too large to be handled by a single team lead or data architect.

Data Engineer

Every DA team should have a data engineering (DE) subteam, which is the beating heart of data analytics. Data engineers are responsible for implementing systems that move, transform, and catalog data in order to render the data suitable for analytics.

In the context of analytics powered by AWS, data engineers nowadays are necessarily multifaceted engineers with skills spanning various areas of technology. They are cloud computing engineers, DevOps engineers, and database/data lake/data warehouse experts, and they are knowledgeable in continuous integration/continuous deployment (CI/CD).

You will find that most DEs have particular strengths and interests, so it would be wise to create a team of DEs with some diversity of skills....

Erscheint lt. Verlag 6.4.2023
Sprache englisch
Themenwelt Mathematik / Informatik Informatik Netzwerke
Schlagworte amazon data analytics • AWS • aws bi • aws business intelligence • aws cloud • aws cloud data pipelines • Aws data analytics • aws predictive analysis • aws serverless data engineering • Cloud Computing • cloud data analytics • Computer Science • data analytics intro • Database & Data Warehousing Technologies • Data Mining • Data Mining & Knowledge Discovery • Data Mining Statistics • Data Mining u. Knowledge Discovery • Datenanalyse • Datenbanken • Datenbanken u. Data Warehousing • Informatik • serverless data engineering • Statistics • Statistik
ISBN-10 1-119-90925-2 / 1119909252
ISBN-13 978-1-119-90925-5 / 9781119909255
Haben Sie eine Frage zum Produkt?
EPUBEPUB (Adobe DRM)
Größe: 23,2 MB

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belle­tristik und Sach­büchern. Der Fließ­text wird dynamisch an die Display- und Schrift­größe ange­passt. Auch für mobile Lese­geräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich
Das umfassende Handbuch

von Martin Linten; Axel Schemberg; Kai Surendorf

eBook Download (2023)
Rheinwerk Computing (Verlag)
29,90
Der Grundkurs für Ausbildung und Praxis. Mit Beispielen in MySQL …

von Ralf Adams

eBook Download (2023)
Carl Hanser Verlag GmbH & Co. KG
29,99