Data Analysis in the Cloud -  Fabrizio Marozzo,  Domenico Talia,  Paolo Trunfio

Data Analysis in the Cloud (eBook)

Models, Techniques and Applications
eBook Download: PDF | EPUB
2015 | 1. Auflage
150 Seiten
Elsevier Science (Verlag)
978-0-12-802914-5 (ISBN)
Systemvoraussetzungen
Systemvoraussetzungen
28,95 inkl. MwSt
  • Download sofort lieferbar
  • Zahlungsarten anzeigen
Data Analysis in the Cloud introduces and discusses models, methods, techniques, and systems to analyze the large number of digital data sources available on the Internet using the computing and storage facilities of the cloud. Coverage includes scalable data mining and knowledge discovery techniques together with cloud computing concepts, models, and systems. Specific sections focus on map-reduce and NoSQL models. The book also includes techniques for conducting high-performance distributed analysis of large data on clouds. Finally, the book examines research trends such as Big Data pervasive computing, data-intensive exascale computing, and massive social network analysis. - Introduces data analysis techniques and cloud computing concepts - Describes cloud-based models and systems for Big Data analytics - Provides examples of the state-of-the-art in cloud data analysis - Explains how to develop large-scale data mining applications on clouds - Outlines the main research trends in the area of scalable Big Data analysis

Domenico Talia is a professor of computer engineering at University of Calabria and partner of two startups: DtoK Lab and Exeura. His research interests include parallel and distributed data mining algorithms, cloud computing, social data analysis, distributed knowledge discovery, mobile computing, green computing systems, peer-to-peer systems, and parallel programming. He is the author of several books including Service-Oriented Distributed Knowledge Discovery (CRC 2012) and Grid Middleware and Services: Challenges and Solutions (Springer 2010), and more than 300 papers in archival journals such as CACM, IEEE TKDE, ACM Computing Surveys, FGCS, Parallel Computing, IEEE Internet Computing and international conference proceedings. He is a member of the editorial boards of many journals including IEEE Transactions on Cloud Computing, the Future Generation Computer Systems journal, Journal of Cloud Computing, and The International Journal on Web and Grid Services.
Data Analysis in the Cloud introduces and discusses models, methods, techniques, and systems to analyze the large number of digital data sources available on the Internet using the computing and storage facilities of the cloud. Coverage includes scalable data mining and knowledge discovery techniques together with cloud computing concepts, models, and systems. Specific sections focus on map-reduce and NoSQL models. The book also includes techniques for conducting high-performance distributed analysis of large data on clouds. Finally, the book examines research trends such as Big Data pervasive computing, data-intensive exascale computing, and massive social network analysis. - Introduces data analysis techniques and cloud computing concepts- Describes cloud-based models and systems for Big Data analytics- Provides examples of the state-of-the-art in cloud data analysis- Explains how to develop large-scale data mining applications on clouds- Outlines the main research trends in the area of scalable Big Data analysis

Chapter 2

Introduction to Cloud Computing


Abstract


This chapter introduces the basic concepts of cloud computing, which provides scalable storage and processing services that can be used for extracting knowledge from big data repositories. Section 2.1 defines cloud computing and discusses the main service and deployment models adopted by cloud providers. The section also describes some cloud platforms that can be used to implement applications and frameworks for distributed data analysis. Section 2.2 discusses more specifically how cloud computing technologies can be used to implement distributed data analysis systems. The section identifies the main requirements that should be satisfied by a distributed data analysis system, and then discusses how a cloud platform can be used to fulfill such requirements.

Keywords


cloud computing
cloud service models
cloud deployment models
Microsoft Azure
Amazon Web Services
OpenNebula
OpenStack
cloud models for distributed data analysis
This chapter introduces the basic concepts of cloud computing, which provides scalable storage and processing services that can be used for extracting knowledge from big data repositories. Section 2.1 defines cloud computing and discusses the main service and deployment models adopted by cloud providers. The section also describes some cloud platforms that can be used to implement applications and frameworks for distributed data analysis. Section 2.2 discusses more specifically how cloud computing technologies can be used to implement distributed data analysis systems. The section identifies the main requirements that should be satisfied by a distributed data analysis system, and then discusses how a cloud platform can be used to fulfill such requirements.

2.1. Cloud computing: definition, models, and architectures


As discussed in the previous chapter, an effective solution to extract useful knowledge from big data repositories in reasonable time is exploiting parallel and distributed data mining techniques. It is also necessary and helpful to work with data analysis environments allowing the effective and efficient access, management and mining of such repositories. For example, a scientist can use a data analysis environment to run complex data mining algorithms, validate models, and compare and share results with colleagues located worldwide.
In the past few years, clouds have emerged as effective computing platforms to face the challenge of extracting knowledge from big data repositories, as well as to provide effective and efficient data analysis environments to both researchers and companies. From a client perspective, the cloud is an abstraction for remote, infinitely scalable provisioning of computation and storage resources. From an implementation point of view, cloud systems are based on large sets of computing resources, located somewhere “in the cloud”, which are allocated to applications on demand (Barga et al., 2011).
Thus, cloud computing can be defined as a distributed computing paradigm in which all the resources, dynamically scalable and often virtualized, are provided as services over the Internet. Virtualization is a software-based technique that implements the separation of physical computing infrastructures and allows creating various “virtual” computing resources on the same hardware. It is a basic technology that powers cloud computing by making possible to concurrently run different operating environments and multiple applications on the same server. Differently from other distributed computing paradigms, cloud users are not required to have knowledge of, expertise in, or control over the technology infrastructure in the “cloud” that supports them. A number of features define cloud applications, services, data, and infrastructure:
Remotely hosted: Services and/or data are hosted on remote infrastructure.
Ubiquitous: Services or data are available from anywhere.
Pay-per-use: The result is a utility computing model similar to that of traditional utilities, like gas and electricity, where you pay for what you use.
We can also use the popular National Institute of Standards and Technology (NIST) definition of cloud computing to highlight its main features (Mell and Grance, 2011): “Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction”. From the NIST definition, we can identify five essential characteristics of cloud computing systems, which are on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service.
Cloud systems can be classified on the basis of their service model (Software as a Service, Platform as a Service, Infrastructure as a Service) and their deployment model (public cloud, private cloud, hybrid cloud).

2.1.1. Service Models


Cloud computing vendors provide their services according to three main models: Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS).
Software as a Service defines a delivery model in which software and data are provided through Internet to customers as ready-to-use services. Specifically, software and associated data are hosted by providers, and customers access them without need to use any additional hardware or software. Moreover, customers normally pay a monthly/yearly fee, with no additional purchase of infrastructure or software licenses. Examples of common SaaS applications are Webmail systems (e.g., Gmail), calendars (Yahoo Calendar), document management (Microsoft Office 365), image manipulation (Photoshop Express), customer relationship management (Salesforce), and others.
In Platform as a Service model, cloud vendors deliver a computing platform typically including databases, application servers, development environment for building, testing, and running custom applications. Developers can just focus on deploying of applications since cloud providers are in charge of maintenance and optimization of the environment and underlying infrastructure. Hence, customers are helped in application development as they use a set of “environment” services that are modular and can be easily integrated. Normally, the applications are developed as ready-to-use SaaS. Google App Engine, Microsoft Azure, Salesforce.com are some examples of PaaS cloud environments.
Finally, Infrastructure as a Service is an outsourcing model under which customers rent resources like CPUs, disks, or more complex resources like virtualized servers or operating systems to support their operations (e.g., Amazon EC2, RackSpace Cloud). Users of an IaaS have normally skills on system and network administration, as they must deal with configuration, operation, and maintenance tasks. Compared to the PaaS approach, the IaaS model has a higher system administration costs for the user; on the other hand, IaaS allows a full customization of the execution environment. Developers can scale up or down its services adding or removing virtual machines, easily instantiable from virtual machine images.
Table 2.1 describes how the three service models satisfy the requirements of developers and final users, in terms of flexibility, scalability, portability, security, maintenance, and costs.

Table 2.1

How SaaS, PaaS, and IaaS Satisfy the Requirements of Developers and Final Users

Requirements SaaS PaaS IaaS
Flexibility Users can customize the application interface and control its behavior, but cannot decide which software and hardware components are used to support its execution. Developers write, customize, test their application using libraries and supporting tools compatible with the platform. Users can choose what kind of virtual storage and compute resources are used for executing their application. Developers have to build the servers that will host their applications, and configure operating system and software modules on top of such servers.
Scalability The underlying computing and storage resources normally scale automatically to match application demand, so that users do not have to allocate resources manually. The result depends only on the level of elasticity provided by the cloud system. Like the SaaS model, the underlying computing and storage resources normally scale automatically. Developers can use new storage and compute resources, but their applications must be scalable and allow the dynamic inclusion of new resources.
Portability There can be problems to move applications to other providers, since some software and tools could not work on different systems. For example, application data may be in a format that cannot be read by another provider. Applications can be moved to another provider only if the new provider shares with the old one the required platform tools and services. If a provider allows to download a virtual...

Erscheint lt. Verlag 15.9.2015
Sprache englisch
Themenwelt Informatik Datenbanken Data Warehouse / Data Mining
Mathematik / Informatik Informatik Netzwerke
Mathematik / Informatik Informatik Web / Internet
Sozialwissenschaften Kommunikation / Medien Buchhandel / Bibliothekswesen
ISBN-10 0-12-802914-5 / 0128029145
ISBN-13 978-0-12-802914-5 / 9780128029145
Haben Sie eine Frage zum Produkt?
PDFPDF (Adobe DRM)
Größe: 9,2 MB

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seiten­layout eignet sich die PDF besonders für Fach­bücher mit Spalten, Tabellen und Abbild­ungen. Eine PDF kann auf fast allen Geräten ange­zeigt werden, ist aber für kleine Displays (Smart­phone, eReader) nur einge­schränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

EPUBEPUB (Adobe DRM)
Größe: 6,4 MB

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belle­tristik und Sach­büchern. Der Fließ­text wird dynamisch an die Display- und Schrift­größe ange­passt. Auch für mobile Lese­geräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich
Datenschutz und Sicherheit in Daten- und KI-Projekten

von Katharine Jarmul

eBook Download (2024)
O'Reilly Verlag
24,99