Problem-solving in High Performance Computing -  Igor Ljubuncic

Problem-solving in High Performance Computing (eBook)

A Situational Awareness Approach with Linux
eBook Download: PDF | EPUB
2015 | 1. Auflage
320 Seiten
Elsevier Science (Verlag)
978-0-12-801064-8 (ISBN)
Systemvoraussetzungen
Systemvoraussetzungen
71,95 inkl. MwSt
  • Download sofort lieferbar
  • Zahlungsarten anzeigen
Problem-Solving in High Performance Computing: A Situational Awareness Approach with Linux focuses on understanding giant computing grids as cohesive systems. Unlike other titles on general problem-solving or system administration, this book offers a cohesive approach to complex, layered environments, highlighting the difference between standalone system troubleshooting and complex problem-solving in large, mission critical environments, and addressing the pitfalls of information overload, micro, and macro symptoms, also including methods for managing problems in large computing ecosystems. The authors offer perspective gained from years of developing Intel-based systems that lead the industry in the number of hosts, software tools, and licenses used in chip design. The book offers unique, real-life examples that emphasize the magnitude and operational complexity of high performance computer systems. - Provides insider perspectives on challenges in high performance environments with thousands of servers, millions of cores, distributed data centers, and petabytes of shared data - Covers analysis, troubleshooting, and system optimization, from initial diagnostics to deep dives into kernel crash dumps - Presents macro principles that appeal to a wide range of users and various real-life, complex problems - Includes examples from 24/7 mission-critical environments with specific HPC operational constraints

Igor Ljubuncic is a Principal Engineer with Rackspace, a managed cloud company. Previously, Igor has worked as an OS architect within Intel's IT Engineering Computing business group, exploring and developing solutions for a large, global high-performance Linux environment that supports Intel's chip design. Igor has twelve years of experience in the hi-tech industry, first as a physicist and lately in various engineering roles, with a strong focus on data-driven methodologies. To date, Igor has had fifteen patents accepted for filing with the US PTO, emphasizing on data center technologies, scheduling, and Internet of Things. He has authored several open-source projects and technical books, numerous articles accepted for publication in leading technical journals and magazines, and presented at prestigious international conferences. In his free time, Igor writes car reviews, fantasy books and manages his Linux-oriented blog, dedoimedo.com, which garners close to a million views from loyal readers every month.
Problem-Solving in High Performance Computing: A Situational Awareness Approach with Linux focuses on understanding giant computing grids as cohesive systems. Unlike other titles on general problem-solving or system administration, this book offers a cohesive approach to complex, layered environments, highlighting the difference between standalone system troubleshooting and complex problem-solving in large, mission critical environments, and addressing the pitfalls of information overload, micro, and macro symptoms, also including methods for managing problems in large computing ecosystems. The authors offer perspective gained from years of developing Intel-based systems that lead the industry in the number of hosts, software tools, and licenses used in chip design. The book offers unique, real-life examples that emphasize the magnitude and operational complexity of high performance computer systems. - Provides insider perspectives on challenges in high performance environments with thousands of servers, millions of cores, distributed data centers, and petabytes of shared data- Covers analysis, troubleshooting, and system optimization, from initial diagnostics to deep dives into kernel crash dumps- Presents macro principles that appeal to a wide range of users and various real-life, complex problems- Includes examples from 24/7 mission-critical environments with specific HPC operational constraints

Introduction: data center and high-end computing


Data center at a glance


If you are looking for a pitch, a one-liner for how to define data centers, then you might as well call them the modern power plants. They are the equivalent of the old, sooty coal factories that used to give the young, enterpreneurial industrialist of the mid 1800s the advantage he needed over the local tradesmen in villages. The plants and their laborers were the unsung heroes of their age, doing their hard labor in the background, unseen, unheard, and yet the backbone of the revolution that swept the world in the nineteenth century.
Fast-forward 150 years, and a similar revolution is happening. The world is transforming from an analog one to a digital, with all the associated difficulties, buzz, and real technological challenges. In the middle of it, there is the data center, the powerhouse of the Internet, the heart of the search, the big in the big data.

Modern data center layout


Realistically, if we were to go into specifics of the data center design and all the underlying pieces, we would need half a dozen books to write it all down. Furthermore, since this is only an introduction, an appetizer, we will only briefly touch this world. In essence, it comes down to three major components: network, compute, and storage. There are miles and miles of wires, thousands of hard disks, angry CPUs running at full speed, serving the requests of billions every second. But on their own, these three pillars do not make a data center. There is more.
If you want an analogy, think of an aircraft carrier. The first thing that comes to mind is Tom Cruise taking off in his F-14, with Kenny Loggins’ Danger Zone playing in the background. It is almost too easy to ignore the fact there are thousands of aviation crew mechanics, technicians, electricians, and other specialists supporting the operation. It is almost too easy to forget the floor upon floor of infrastructure and workshops, and in the very heart of it, an IT center, carefully orchestrating the entire piece.
Data centers are somewhat similar to the 100,000-ton marvels patrolling the oceans. They have their components, but they all need to communicate and work together. This is why when you talk about data centers, concepts such as cooling and power density are just as critical as the type of processor and disk one might use. Remote management, facility security, disaster recovery, backup – all of these are hardly on the list, but the higher you scale, the more important they become.

Welcome to the borg, resistance is futile


In the last several years, we see a trend moving from any old setup that includes computing components into something approaching standards. Like any technology, the data center has reached a point at which it can no longer sustain itself on its own, and the world cannot tolerate a hundred different versions of it. Similar to the convergence of other technologies, such as network protocols, browser standards, and to some extent, media standards, the data center as a whole is also becoming a standard. For instance, the Open Data Center Alliance (ODCA) (Open Data Center Alliance, n.d.) is a consortium established in 2010, driving adoption of interoperable solutions and services – standards – across the industry.
In this reality, hanging on to your custom workshop is like swimming against the current. Sooner or later, either you or the river will have to give up. Having a data center is no longer enough. And this is part of the reason for this book – solving problems and creating solutions in a large, unique high-performance setup that is the inevitable future of data centers.

Powers that be


Before we dig into any tactical problem, we need to discuss strategy. Working with a single computer at home is nothing like doing the same kind of work in a data center. And while the technology is pretty much identical, all the considerations you have used before – and your instincts – are completely wrong.
High-performance computing starts and ends with scale, the ability to grow at a steady rate in a sustainable manner without increasing your costs exponentially. This has always been a challenging task, and quite often, companies have to sacrifice growth once their business explodes beyond control. It is often the small, neglected things that force the slowdown – power, physical space, the considerations that are not often immediate or visible.

Enterprise versus Linux


Another challenge that we are facing is the transition from the traditional world of the classic enterprise into the quick, rapid-paced, ever-changing cloud. Again, it is not about technology. It is about people who have been in the IT business for many years, and they are experiencing this sudden change right before their eyes.

The classic office


Enabling the office worker to use their software, communicate with colleagues and partners, send email, and chat has been a critical piece of the Internet since its earlier days. But, the office is a stagnant, almost boring environment. The needs for change and growth are modest.

Linux computing environment


The next evolutionary step in the data center business was the creation of the Linux operating system. In one fell swoop, it delivered a whole range of possibilities that were not available beforehand. It offered affordable cost compared to expensive mainframe setups. It offered reduced licensing costs, and the largely open-source nature of the product allowed people from the wider community to participate and modify the software. Most importantly, it also offered scale, from minimal setups to immense supercomputers, accommodating both ends of the spectrum with almost nonchalant ease.
And while there was chaos in the world of Linux distributions, offering a variety of flavors and types that could never really catch on, the kernel remained largely standard, and allowed businesses to rely on it for their growth. Alongside opportunity, there was a great shift in the perception in the industry, and the speed of change, testing the industry’s experts to their limit.

Linux cloud


Nowadays, we are seeing the third iteration in the evolution of the data center. It is shifting from being the enabler for products into a product itself. The pervasiveness of data, embodied in the concept called the Internet of Things, as well as the fact that a large portion of modern (and online) economy is driven through data search, has transformed the data center into an integral piece of business logic.
The word cloud is used to describe this transformation, but it is more than just having free compute resources available somewhere in the world and accessible through a Web portal. Infrastructure has become a service (IaaS), platforms have become a service (PaaS), and applications running on top of a very complex, modular cloud stack are virtually indistinguishable from the underlying building blocks.
In the heart of this new world, there is Linux, and with it, a whole new generation of challenges and problems of a different scale and problem that system administrators never had to deal with in the past. Some of the issues may be similar, but the time factor has changed dramatically. If you could once afford to run your local system investigation at your own pace, you can no longer afford to do so with cloud systems. Concepts such as uptime, availability, and price dictate a different regime of thinking and require different tools. To make things worse, speed and technical capabilities of the hardware are being pushed to the limit, as science and big data mercilessly drive the high-performance compute market. Your old skills as a troubleshooter are being put to a test.

10,000 × 1 does not equal 10,000


The main reason why a situational-awareness approach to problem solving is so important is that linear growth brings about exponential complexity. Tools that work well on individual hosts are not built for mass deployments or do not have the capability for cross-system use. Methodologies that are perfectly suited for slow-paced, local setups are utterly outclassed in the high-performance race of the modern world.

Nonlinear scaling of issues


On one hand, larger environments become more complex because they simply have a much greater number of components in them. For instance, take a typical hard disk. An average device may have a mean time between failure (MTBF) of about 900 years. That sounds like a pretty safe bet, and you are more likely to decommission a disk after several years of use than see it malfunction. But if you have a thousand disks, and they are all part of a larger ecosystem, the MTBF shrinks down to about 1 year, and suddenly, problems you never had to deal with explicitly become items on the daily agenda.
On the other hand, large environments also require additional considerations when it comes to power, cooling, physical layout and design of data center aisles and rack, the network interconnectivity, and the number of edge devices. Suddenly, there are new dependencies that never existed on a smaller scale, and those that did are magnified or made significant when looking at the system as a whole. The considerations you may have for problem solving change.

The law of large numbers


It is almost too easy to overlook how much effect small, seemingly imperceptible changes in great quantity can have on the larger system. If you were to optimize the kernel on a single...

Erscheint lt. Verlag 1.9.2015
Sprache englisch
Themenwelt Mathematik / Informatik Informatik Netzwerke
Mathematik / Informatik Informatik Theorie / Studium
Informatik Weitere Themen Hardware
ISBN-10 0-12-801064-9 / 0128010649
ISBN-13 978-0-12-801064-8 / 9780128010648
Haben Sie eine Frage zum Produkt?
PDFPDF (Adobe DRM)
Größe: 37,5 MB

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seiten­layout eignet sich die PDF besonders für Fach­bücher mit Spalten, Tabellen und Abbild­ungen. Eine PDF kann auf fast allen Geräten ange­zeigt werden, ist aber für kleine Displays (Smart­phone, eReader) nur einge­schränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

EPUBEPUB (Adobe DRM)
Größe: 21,7 MB

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belle­tristik und Sach­büchern. Der Fließ­text wird dynamisch an die Display- und Schrift­größe ange­passt. Auch für mobile Lese­geräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich