Mastering OpenTelemetry and Observability - Steve Flanders

Blick ins Buch

Mastering OpenTelemetry and Observability (eBook)

Enhancing Application and Infrastructure Performance and Avoiding Outages

Steve Flanders (Autor)

eBook Download: EPUB

2024
698 Seiten
Wiley (Verlag)
978-1-394-25313-5 (ISBN)

Lese- und Medienproben

Ebook-Leseprobe (EPUB)

Discover the power of open source observability for your enterprise environment

In Mastering Observability and OpenTelemetry: Enhancing Application and Infrastructure Performance and Avoiding Outages, accomplished engineering leader and open source contributor Steve Flanders unlocks the secrets of enterprise application observability with a comprehensive guide to OpenTelemetry (OTel). Explore how OTel transforms observability, providing a robust toolkit for capturing and analyzing telemetry data across your environment.

You will learn how OTel delivers unmatched flexibility, extensibility, and vendor neutrality, freeing you from vendor lock-in and enabling data sovereignty and portability. You will also discover:

Comprehensive coverage of observability issues and technology: Dive deep into the world of observability and gain a comprehensive understanding of observability fundamentals with practical insights and real-world use cases.
Practical guidance: From instrumentation techniques to advanced tracing strategies, gain the skills needed to create highly observable systems. Learn how to deploy and configure OTel, even in challenging brownfield environments, with step-by-step instructions and hands-on exercises.
An opportunity for community contributions and communication: Join the OTel community, including end-users, vendors, and cloud providers, and shape the future of observability while connecting with experts and peers.

Whether you are a novice or a seasoned professional, Mastering Observability and OpenTelemetry is your roadmap to troubleshooting availability and performance problems by learning to detect anomalies, interpret data, and proactively optimize performance in your enterprise environment. Embark on your journey to observability mastery today!

STEVE FLANDERS is a Senior Director of Engineering at Splunk, a Cisco company. Steve is one of the founding members of the OpenTelemetry project.

Discover the power of open source observability for your enterprise environment In Mastering Observability and OpenTelemetry: Enhancing Application and Infrastructure Performance and Avoiding Outages, accomplished engineering leader and open source contributor Steve Flanders unlocks the secrets of enterprise application observability with a comprehensive guide to OpenTelemetry (OTel). Explore how OTel transforms observability, providing a robust toolkit for capturing and analyzing telemetry data across your environment. You will learn how OTel delivers unmatched flexibility, extensibility, and vendor neutrality, freeing you from vendor lock-in and enabling data sovereignty and portability. You will also discover: Comprehensive coverage of observability issues and technology: Dive deep into the world of observability and gain a comprehensive understanding of observability fundamentals with practical insights and real-world use cases. Practical guidance: From instrumentation techniques to advanced tracing strategies, gain the skills needed to create highly observable systems. Learn how to deploy and configure OTel, even in challenging brownfield environments, with step-by-step instructions and hands-on exercises. An opportunity for community contributions and communication: Join the OTel community, including end-users, vendors, and cloud providers, and shape the future of observability while connecting with experts and peers. Whether you are a novice or a seasoned professional, Mastering Observability and OpenTelemetry is your roadmap to troubleshooting availability and performance problems by learning to detect anomalies, interpret data, and proactively optimize performance in your enterprise environment. Embark on your journey to observability mastery today!

Chapter 1
What Is Observability?

In modern software development and operations, observability has emerged as a fundamental concept essential for maintaining and improving the performance, reliability, and scalability of complex systems. But what exactly is observability? At its core, observability is the practice of gaining insights into the internal states and behaviors of systems through the collection, analysis, and visualization of telemetry data. Unlike traditional monitoring, which primarily focuses on predefined metrics and thresholds, observability offers a more comprehensive and dynamic approach, enabling teams to proactively detect, diagnose, and resolve issues.

This chapter will explore the principles and components of observability, highlighting its significance in today’s distributed and microservices-based architectures. Through a deep dive into the three pillars of observability—metrics, logs, and traces—you will understand the groundwork for how observability can transform the way resilient systems are built and managed.

IN THIS CHAPTER, YOU WILL LEARN TO:

Differentiate between monitoring and observability
Explain the importance of metadata
Identify the differences between telemetry signals
Distinguish between instrumentation and data collection
Analyze the requirements for choosing an observability platform

Definition

So, what is observability in the realm of modern software development and operations? While many definitions exist, they all generally refer to observability providing the ability to quickly identify availability and performance problems, regardless of whether they have been experienced before, and help perform problem isolation, root cause analysis, and remediation. Because observability is about making it easier to understand complex systems and address unperceived issues, often referred to in the software industry as unknown unknowns,1 the data collected must be correlated across different telemetry types and be rich enough and immediately accessible to answer questions during a live incident.

The Cloud Native Computing Foundation (CNCF), described more fully later in this chapter, provides a definition for the term observability:2

Observability is a system property that defines the degree to which the system can generate actionable insights. It allows users to understand a system’s state from these external outputs and take (corrective) action.

Computer systems are measured by observing low-level signals such as CPU time, memory, disk space, and higher-level and business signals, including API response times, errors, transactions per second, etc. These observable systems are observed (or monitored) through specialized tools, so-called observability tools. A list of these tools can be viewed in the Cloud Native Landscape’s observability section.3

Observable systems yield meaningful, actionable data to their operators, allowing them to achieve favorable outcomes (faster incident response, increased developer productivity) and less toil and downtime.

Consequently, the observability of a system will significantly impact its operating and development costs.

While the CNCF’s definition is good, it is missing a few critical aspects:

The goal of observability should be where a system’s state can be fully understood from its external output without the need to ship code. This means you should be able to ask novel questions about your observability data, especially questions you had not thought of beforehand.
Observability is not just about collecting data but about collecting meaningful data, such as data with context and correlated across different sources, and storing it on a platform that offers rich analytics and query capabilities across signals.
A system is truly observable when you can troubleshoot without prior knowledge of the system.

The OpenTelemetry project, which will be introduced in Chapter 2, “Introducing OpenTelemetry!,” provides a definition of observability that is worth highlighting:

Observability lets you understand a system from the outside, by letting us ask questions about that system without knowing its inner workings. Furthermore, it allows you to easily troubleshoot and handle novel problems—that is, “unknown unknowns.” It also helps you answer the question, “Why is this happening?”

To ask those questions about your system, your application must be properly instrumented. That is, the application code must emit signals such as traces, metrics, and logs. An application is properly instrumented when developers don’t need to add more instrumentation to troubleshoot an issue, because they have all of the information they need.4

In short, observability is about collecting critical telemetry data with relevant context and using that data to quickly determine your systems’ behavior and health. Observability goes beyond mere monitoring by enabling a proactive and comprehensive understanding of system behavior, facilitating quicker detection, diagnosis, and resolution of issues. This capability is crucial in today’s fast-paced, microservices-driven, distributed environments, where the complexity and dynamic nature of systems demand robust and flexible observability solutions. Through the lens of the CNCF and OpenTelemetry, you can see observability is not just defined as a set of tools and practices but as a fundamental shift toward more resilient, reliable, and efficient system management.

RILEY JOINS JUPITERIAN

Riley (she/her) is an experienced site reliability engineer (SRE) with deep observability and operations experience. She recently joined Jupiterian to address their observability problems and work with a new vendor. Riley joined Jupiterian from a large private equity (PE) advertising company, where she was the technical lead of the SRE team and was responsible for a large-scale, globally distributed, cloud native architecture. Before that, she was the founding member of a growth startup where she developed observability practices and culture while helping scale the business to over three million dollars in annual recurring revenue (ARR). Riley was excited about the challenge and opportunity of building observability practices from the ground up at a public enterprise company transitioning to the cloud.

Jupiterian is an e-commerce company that has been around for more than two decades. Over the last five years, the company has seen a massive influx of customers and has been on a journey to modernize its tech stack to keep up with demand and the competition. As part of these changes, it has been migrating from its on-premises monolithic application to a microservices-based architecture running on Kubernetes (K8s) and deployed in the cloud. Recently, outages have been plaguing the new architecture—a problem threatening the company and one that needed to be resolved before the annual peak traffic expected during the upcoming holiday season.

For the original architecture, the company had been using Zabbix, an open source monitoring solution to monitor the environment. The IT team was beginning to learn about DevOps practices and had set up Prometheus for the new architecture. Given organizational constraints and priorities, they did not have the time to develop the skill set to manage it and the ever-increasing number of collected metrics. In short, a critical piece of the new architecture was without ownership. On top of this, engineering teams continued to add data, dashboards, and alerts without defined standards or processes. Not surprisingly, this resulted in the company having difficulty proactively identifying availability and performance issues. It also resulted in various observability issues, including Prometheus availability, blind spots, and alert storms. In terms of observability, the company frequently experienced infrastructure issues and could not tell if it was because of an architecture limitation or an improper use of the new infrastructure. As a result, engineers feared going on-call, and innovation velocity was significantly below average.

The Jupiterian engineering team had been pushing management to invest more in observability and SRE. Instead, head count remained flat, and the product roadmaps, driven primarily by the sales team, continued to take priority. With the service missing its service-level agreement (SLA) target for the last three months, leadership demanded a focus on resiliency. To address the problem, the Chief Technology Officer (CTO) signed a three-year deal with Watchwhale, an observability vendor, so the company could focus on its core intellectual property (IP) instead of managing third-party software. An architect in the office of the CTO vetted the vendor and its technology. Given other organizational priorities, the engineering team was largely uninvolved in the proof of concept (PoC). The Vice President (VP) of Engineering was tasked with ensuring the service’s SLA was consistently hit ahead of the holiday period as well as the adoption and success of the Watchwhale product. He allocated one of his budget IDs (BIDs) for a senior SRE position, which led to Riley being hired.

Background

The term observability has been around since at least the mid-20th century and is mainly...

Erscheint lt. Verlag	22.10.2024
Reihe/Serie	Tech Today
Sprache	englisch
Themenwelt	Informatik ► Office Programme ► Outlook
Schlagworte	application observability • cloud infrastructure observability • cloud native observability • observability book • observability toolkit • observability tools • open source observability • open source telemetry • otel • OTelCol • OTLP • preventing downtime
ISBN-10	1-394-25313-3 / 1394253133
ISBN-13	978-1-394-25313-5 / 9781394253135

Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?

EPUB (Adobe DRM)
Größe: 14,9 MB

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.