Dependable Computing -  Ravishankar K. Iyer,  Zbigniew T. Kalbarczyk,  Nithin M. Nakka

Dependable Computing (eBook)

Design and Assessment
eBook Download: EPUB
2024 | 1. Auflage
848 Seiten
Wiley (Verlag)
978-1-119-74346-0 (ISBN)
Systemvoraussetzungen
115,99 inkl. MwSt
  • Download sofort lieferbar
  • Zahlungsarten anzeigen
Dependable Computing

Covering dependability from software and hardware perspectives

Dependable Computing: Design and Assessment looks at both the software and hardware aspects of dependability.

This book:

  • Provides an in-depth examination of dependability/fault tolerance topics
  • Describes dependability taxonomy, and briefly contrasts classical techniques with their modern counterparts or extensions
  • Walks up the system stack from the hardware logic via operating systems up to software applications with respect to how they are hardened for dependability
  • Describes the use of measurement-based analysis of computing systems
  • Illustrates technology through real-life applications
  • Discusses security attacks and unique dependability requirements for emerging applications, e.g., smart electric power grids and cloud computing
  • Finally, using critical societal applications such as autonomous vehicles, large-scale clouds, and engineering solutions for healthcare, the book illustrates the emerging challenges faced in making artificial intelligence (AI) and its applications dependable and trustworthy.

This book is suitable for those studying in the fields of computer engineering and computer science. Professionals who are working within the new reality to ensure dependable computing will find helpful information to support their efforts. With the support of practical case studies and use cases from both academia and real-world deployments, the book provides a journey of developments that include the impact of artificial intelligence and machine learning on this ever-growing field. This book offers a single compendium that spans the myriad areas in which dependability has been applied, providing theoretical concepts and applied knowledge with content that will excite a beginner, and rigor that will satisfy an expert. Accompanying the book is an online repository of problem sets and solutions, as well as slides for instructors, that span the chapters of the book.

Ravishankar K. Iyer is George and Ann Fisher Distinguished Professor of Engineering at the University of Illinois Urbana-Champaign, USA. He holds joint appointments in the Departments of Electrical & Computer Engineering and Computer Science as well as the Coordinated Science Laboratory (CSL), the National Center for Supercomputing Applications (NCSA), and the Carl R. Woese Institute for Genomic Biology. The winner of numerous awards and honors, he was the founding chief scientist of the Information Trust Institute at UIUC-a campus-wide research center addressing security, reliability, and safety issues in critical infrastructures.

Zbigniew T. Kalbarczyk is a Research Professor in the Department of Electrical & Computer Engineering and the Coordinated Science Laboratory of the University of Illinois Urbana-Champaign, USA. He is a member of the IEEE, the IEEE Computer Society, and IFIP Working Group 10.4 on Dependable Computing and Fault Tolerance. Dr. Kalbarczyk's research interests are in the design and validation of reliable and secure computing systems. His current work explores emerging computing technologies, machine learning-based methods for early detection of security attacks, analysis of data on failures and security attacks in large computing systems, and more.

Nithin M. Nakka received his B. Tech (hons.) degree from the Indian Institute of Technology, Kharagpur, India, and his M.S. and Ph.D. degrees from the University of Illinois Urbana-Champaign, USA. He is a Technical Leader at Cisco Systems and has worked on most layers of the networking stack, from network data-plane hardware, including layer-2 and layer-3 (control plane), network controllers, and network fabric monitoring. His areas of research interest include systems reliability, network telemetry, and hardware-implemented fault tolerance.


Dependable Computing Covering dependability from software and hardware perspectives Dependable Computing: Design and Assessment looks at both the software and hardware aspects of dependability. This book: Provides an in-depth examination of dependability/fault tolerance topics Describes dependability taxonomy, and briefly contrasts classical techniques with their modern counterparts or extensions Walks up the system stack from the hardware logic via operating systems up to software applications with respect to how they are hardened for dependability Describes the use of measurement-based analysis of computing systems Illustrates technology through real-life applications Discusses security attacks and unique dependability requirements for emerging applications, e.g., smart electric power grids and cloud computing Finally, using critical societal applications such as autonomous vehicles, large-scale clouds, and engineering solutions for healthcare, the book illustrates the emerging challenges faced in making artificial intelligence (AI) and its applications dependable and trustworthy. This book is suitable for those studying in the fields of computer engineering and computer science. Professionals who are working within the new reality to ensure dependable computing will find helpful information to support their efforts. With the support of practical case studies and use cases from both academia and real-world deployments, the book provides a journey of developments that include the impact of artificial intelligence and machine learning on this ever-growing field. This book offers a single compendium that spans the myriad areas in which dependability has been applied, providing theoretical concepts and applied knowledge with content that will excite a beginner, and rigor that will satisfy an expert. Accompanying the book is an online repository of problem sets and solutions, as well as slides for instructors, that span the chapters of the book.

1
Dependability Concepts and Taxonomy


1.1 Introduction


Every single failure in any computing device is a potential cause for concern. Reliable computing and fault tolerance, or, to use a more current term, dependable computing, is a longstanding area of research and practical implementation. This broad area of study started in the mid‐fifties with John von Neumann's construction of reliable systems from unreliable systems or components. Over the years, significant advancements and deployments have been made in commercial telecommunications, defense, and business applications that address a wide range of potential failures. Today, an explosion in the complexity of systems, applications, and operating systems has resulted in ever‐expanding failure sources. That, combined with explosive growth in computing as an enterprise in all areas of human endeavor, has brought forth new challenges and opportunities in designing dependable systems. Further, early detection, rapid concurrent/online diagnosis, and efficient and complete recovery are key to the design of systems that continue to operate in the event of errors. They must be complemented by ongoing analysis and monitoring of failures, supported by strong statistical models. In dependability, an understanding of real failures is critical in the design, implementation, deployment, and validation of reliability techniques. Design and validation must go hand in hand in developing new systems. While dependability techniques protect systems against known faults, their greatest efficacy comes from their ability to safeguard against unanticipated failures due to accidental errors or malicious attacks.

This chapter sets the theme of the book by first placing classic work on dependability techniques in perspective and relating their importance for current computing systems. That assessment is followed by a description of the complexity of systems built using present‐day hardware designs, architectures, and software technologies that pose compelling challenges in providing continuous availability against a vast array of potential failures. Examples are provided of the developmental (or changing) trends in these areas that motivate the need for a newer perspective on dependability. The purpose of this chapter is to bring forth the recent challenges and opportunities in the reliability domain. (Possible solutions and techniques for fault tolerance and security will be explained as the book unfolds in the remaining chapters.) The discussion concludes with an introduction of dependability concepts, definitions, a taxonomy of failures, and a sample set of measurements from real systems in preparation for the next chapter's description of basic techniques.

The entire book follows the theme set by this chapter in introducing fundamentals of techniques with examples of prior deployment of the techniques in systems currently in use, with the goal of educating the reader on the applicability of these techniques, and any modifications or adaptations they need for use in modern and upcoming systems.

1.2 Placing Classical Dependability Techniques in Perspective


The earliest diagnostic techniques were developed for testing and failure recovery in the ILLIAC machine at the University of Illinois [1, 2] in the 1950s. When ILLIAC I (1950) and ILLIAC II (1961) were built at Illinois, fault diagnosis consisted of a battery of programs that exercised different sections of the machine. Typically, the test programs compared answers computed in two different ways, or stressed what was suspected to be a vulnerable part. In the ILLIAC II, the arithmetic and control units were designed to operate asynchronously, using a double handshake for each control signal and its acknowledgment. That protocol simplified the fault diagnosis, as it was used as an automatic fault detection mechanism. Most faults caused the control to wait for the next step in the asynchronous handshake protocol; that next step was identified using indicator lights for the flip‐flops.

Spaceborne computing systems were one of the earliest avenues for dependability design. Early work on dependability in space‐mission systems was performed on the JPL‐STAR (Jet Propulsion Laboratory Self‐Testing and Repair) computer (1971) [3] and on Voyager [4], leading to work on the Boeing 777 [5]. Although the craft carrying the JPL‐STAR computer never went into space, its development resulted in the design and implementation of a range of techniques that are considered standard today. The Voyager computer (launched in 1977) used block redundancy (a form of a standby redundancy whereby redundancy is provided at the subsystem level, e.g., at the altitude control subsystem, rather than internally in each subsystem) for fault tolerance. Heartbeat‐based hardware‐ and software‐implemented techniques were used for error detection. For example, an error would be detected in the hardware if a command for the primary (in the dual‐redundant configuration) arrived before the current command had been completely processed, and in software error detection, an error would be detected when the output unit in the primary remained unavailable for more than 14 seconds. Further developments in dependability in aviation were used in the design of the Boeing 777 fly‐by‐wire system, which used triple modular redundancy for all hardware resources, including the computing system, airplane electrical power, hydraulic power, and communication path.

The basic techniques established for hardware redundancy and software‐based fault and failure management, exceptions, and their handling in software, and the use of error codes in memory systems, transmission, and disk systems have been the mainstay of practical and commercial systems such as the AT&T No. 5 ESS [6], IBM S/360, and IBM S/370 [7]. These systems included a combination of hardware and software techniques and diagnostics that significantly advanced the theory and practice of dependable computing. The methods have since been augmented with computational algorithms and protocols to achieve consistency and reliable operation in distributed systems [8].

While parity, ECC (error correcting codes) and redundant array of independent disks have been widely used for commodity systems, the use of massive redundancy in hardware and software has led to high overheads in performance costs, hardware components, and software development costs. For example, the IBM MVS operating system devotes 50% of its software code base to fault management [9], while the IBM G5 processor dedicates 35% of its processor silicon area to fault detection and tolerance hardware [10]. In addition to those overheads, the validation of such systems has become increasingly complex and difficult. Thus, the use of the techniques discussed above to build “one‐size‐fits‐all” architectures has become reserved for high‐end, high‐cost systems such as those used in military, telecommunication, and financial applications. Until recently, those application domains depended on traditional techniques in which redundancy in the hardware, combined with hooks into the operating system, together supported some level of software redundancy. On the other hand, until recently, failures in commodity environments did not have such a big cost impact and hence were either not addressed or at best marginally addressed.

With the explosion of computing devices, and in particular a variety of mobile/handheld devices in a wide variety of applications, computing has become a social enterprise. Massive computing data centers are distributed geographically, logically, and physically, servicing networked entities from telecom to Internet service providers to banks (i.e., high‐dependability domains). On the one hand, the likes of Amazon and Google have increasingly adopted and invested in high‐performance computing systems. On the other hand, ubiquitous computing, present in everyday appliances such as washing machines and microwaves, vehicles such as automobiles and airplanes, and applications such as e‐commerce and health monitoring, has dramatically changed the impact of computing system failures on the world's social and economic machinery. With computing now a common enterprise, such outages can no longer be ignored or brushed aside with a marginal or cursory solution. Dependability requirements for these systems are nearly as high as those for the legacy systems that extensively used redundancy throughout the system. However, the cost margin for high availability is typically small, precluding the use of traditional techniques for commodity systems. New, low‐cost techniques that are tailored to the specific needs of the application are required for the emerging domains. On the other end of the spectrum from embedded, ubiquitous computing are new large‐scale, high‐performance computing systems (i.e., supercomputers) for which dependability (or the ability to compute through failures) is paramount for providing sustained performance at scale. Such systems pose another important challenge with respect to dependability. The domain‐specific requirements of the varied systems discussed thus far, failures during recovery in any system significantly change the dependability dynamics of the system [6, 11]. However, this aspect has not been adequately considered in either the design or the assessment of computing systems.

1.3 Taxonomy of Dependable Computing


In this section, we...

Erscheint lt. Verlag 18.4.2024
Sprache englisch
Themenwelt Mathematik / Informatik Informatik Netzwerke
Mathematik / Informatik Informatik Theorie / Studium
ISBN-10 1-119-74346-X / 111974346X
ISBN-13 978-1-119-74346-0 / 9781119743460
Haben Sie eine Frage zum Produkt?
EPUBEPUB (Adobe DRM)
Größe: 18,1 MB

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belle­tristik und Sach­büchern. Der Fließ­text wird dynamisch an die Display- und Schrift­größe ange­passt. Auch für mobile Lese­geräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich
Das umfassende Handbuch

von Martin Linten; Axel Schemberg; Kai Surendorf

eBook Download (2023)
Rheinwerk Computing (Verlag)
29,90
Das umfassende Handbuch

von Michael Kofler; Charly Kühnast; Christoph Scherbeck

eBook Download (2024)
Rheinwerk Computing (Verlag)
33,68