Data Analysis in the Cloud - Fabrizio Marozzo, Domenico Talia, Paolo Trunfio

Data Analysis in the Cloud (eBook)

Models, Techniques and Applications

Fabrizio Marozzo, Domenico Talia, Paolo Trunfio (Autoren)

eBook Download: PDF | EPUB

2015 | 1. Auflage
150 Seiten
Elsevier Science (Verlag)
978-0-12-802914-5 (ISBN)

Data Analysis in the Cloud introduces and discusses models, methods, techniques, and systems to analyze the large number of digital data sources available on the Internet using the computing and storage facilities of the cloud. Coverage includes scalable data mining and knowledge discovery techniques together with cloud computing concepts, models, and systems. Specific sections focus on map-reduce and NoSQL models. The book also includes techniques for conducting high-performance distributed analysis of large data on clouds. Finally, the book examines research trends such as Big Data pervasive computing, data-intensive exascale computing, and massive social network analysis. - Introduces data analysis techniques and cloud computing concepts - Describes cloud-based models and systems for Big Data analytics - Provides examples of the state-of-the-art in cloud data analysis - Explains how to develop large-scale data mining applications on clouds - Outlines the main research trends in the area of scalable Big Data analysis

Domenico Talia is a professor of computer engineering at University of Calabria and partner of two startups: DtoK Lab and Exeura. His research interests include parallel and distributed data mining algorithms, cloud computing, social data analysis, distributed knowledge discovery, mobile computing, green computing systems, peer-to-peer systems, and parallel programming. He is the author of several books including Service-Oriented Distributed Knowledge Discovery (CRC 2012) and Grid Middleware and Services: Challenges and Solutions (Springer 2010), and more than 300 papers in archival journals such as CACM, IEEE TKDE, ACM Computing Surveys, FGCS, Parallel Computing, IEEE Internet Computing and international conference proceedings. He is a member of the editorial boards of many journals including IEEE Transactions on Cloud Computing, the Future Generation Computer Systems journal, Journal of Cloud Computing, and The International Journal on Web and Grid Services.

Chapter 2

Introduction to Cloud Computing

Abstract

This chapter introduces the basic concepts of cloud computing, which provides scalable storage and processing services that can be used for extracting knowledge from big data repositories. Section 2.1 defines cloud computing and discusses the main service and deployment models adopted by cloud providers. The section also describes some cloud platforms that can be used to implement applications and frameworks for distributed data analysis. Section 2.2 discusses more specifically how cloud computing technologies can be used to implement distributed data analysis systems. The section identifies the main requirements that should be satisfied by a distributed data analysis system, and then discusses how a cloud platform can be used to fulfill such requirements.

Keywords

cloud computing

cloud service models

cloud deployment models

Microsoft Azure

Amazon Web Services

OpenNebula

OpenStack

cloud models for distributed data analysis

2.1. Cloud computing: definition, models, and architectures

As discussed in the previous chapter, an effective solution to extract useful knowledge from big data repositories in reasonable time is exploiting parallel and distributed data mining techniques. It is also necessary and helpful to work with data analysis environments allowing the effective and efficient access, management and mining of such repositories. For example, a scientist can use a data analysis environment to run complex data mining algorithms, validate models, and compare and share results with colleagues located worldwide.

In the past few years, clouds have emerged as effective computing platforms to face the challenge of extracting knowledge from big data repositories, as well as to provide effective and efficient data analysis environments to both researchers and companies. From a client perspective, the cloud is an abstraction for remote, infinitely scalable provisioning of computation and storage resources. From an implementation point of view, cloud systems are based on large sets of computing resources, located somewhere “in the cloud”, which are allocated to applications on demand (Barga et al., 2011).

Thus, cloud computing can be defined as a distributed computing paradigm in which all the resources, dynamically scalable and often virtualized, are provided as services over the Internet. Virtualization is a software-based technique that implements the separation of physical computing infrastructures and allows creating various “virtual” computing resources on the same hardware. It is a basic technology that powers cloud computing by making possible to concurrently run different operating environments and multiple applications on the same server. Differently from other distributed computing paradigms, cloud users are not required to have knowledge of, expertise in, or control over the technology infrastructure in the “cloud” that supports them. A number of features define cloud applications, services, data, and infrastructure:

• Remotely hosted: Services and/or data are hosted on remote infrastructure.

• Ubiquitous: Services or data are available from anywhere.

• Pay-per-use: The result is a utility computing model similar to that of traditional utilities, like gas and electricity, where you pay for what you use.

We can also use the popular National Institute of Standards and Technology (NIST) definition of cloud computing to highlight its main features (Mell and Grance, 2011): “Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction”. From the NIST definition, we can identify five essential characteristics of cloud computing systems, which are on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service.

Cloud systems can be classified on the basis of their service model (Software as a Service, Platform as a Service, Infrastructure as a Service) and their deployment model (public cloud, private cloud, hybrid cloud).

2.1.1. Service Models

Cloud computing vendors provide their services according to three main models: Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS).

Software as a Service defines a delivery model in which software and data are provided through Internet to customers as ready-to-use services. Specifically, software and associated data are hosted by providers, and customers access them without need to use any additional hardware or software. Moreover, customers normally pay a monthly/yearly fee, with no additional purchase of infrastructure or software licenses. Examples of common SaaS applications are Webmail systems (e.g., Gmail), calendars (Yahoo Calendar), document management (Microsoft Office 365), image manipulation (Photoshop Express), customer relationship management (Salesforce), and others.

In Platform as a Service model, cloud vendors deliver a computing platform typically including databases, application servers, development environment for building, testing, and running custom applications. Developers can just focus on deploying of applications since cloud providers are in charge of maintenance and optimization of the environment and underlying infrastructure. Hence, customers are helped in application development as they use a set of “environment” services that are modular and can be easily integrated. Normally, the applications are developed as ready-to-use SaaS. Google App Engine, Microsoft Azure, Salesforce.com are some examples of PaaS cloud environments.

Finally, Infrastructure as a Service is an outsourcing model under which customers rent resources like CPUs, disks, or more complex resources like virtualized servers or operating systems to support their operations (e.g., Amazon EC2, RackSpace Cloud). Users of an IaaS have normally skills on system and network administration, as they must deal with configuration, operation, and maintenance tasks. Compared to the PaaS approach, the IaaS model has a higher system administration costs for the user; on the other hand, IaaS allows a full customization of the execution environment. Developers can scale up or down its services adding or removing virtual machines, easily instantiable from virtual machine images.

Table 2.1 describes how the three service models satisfy the requirements of developers and final users, in terms of flexibility, scalability, portability, security, maintenance, and costs.

Table 2.1

How SaaS, PaaS, and IaaS Satisfy the Requirements of Developers and Final Users

Requirements

SaaS

PaaS

IaaS

Flexibility

Users can customize the application interface and control its behavior, but cannot decide which software and hardware components are used to support its execution.

Developers write, customize, test their application using libraries and supporting tools compatible with the platform. Users can choose what kind of virtual storage and compute resources are used for executing their application.

Developers have to build the servers that will host their applications, and configure operating system and software modules on top of such servers.

Scalability

The underlying computing and storage resources normally scale automatically to match application demand, so that users do not have to allocate resources manually. The result depends only on the level of elasticity provided by the cloud system.

Like the SaaS model, the underlying computing and storage resources normally scale automatically.

Developers can use new storage and compute resources, but their applications must be scalable and allow the dynamic inclusion of new resources.

Portability

There can be problems to move applications to other providers, since some software and tools could not work on different systems. For example, application data may be in a format that cannot be read by another provider.

Applications can be moved to another provider only if the new provider shares with the old one the required platform tools and services.

If a provider allows to download a virtual...

Erscheint lt. Verlag	15.9.2015
Sprache	englisch
Themenwelt	Informatik ► Datenbanken ► Data Warehouse / Data Mining
	Mathematik / Informatik ► Informatik ► Netzwerke
	Mathematik / Informatik ► Informatik ► Web / Internet
	Sozialwissenschaften ► Kommunikation / Medien ► Buchhandel / Bibliothekswesen
ISBN-10	0-12-802914-5 / 0128029145
ISBN-13	978-0-12-802914-5 / 9780128029145

Haben Sie eine Frage zum Produkt?

PDF (Adobe DRM)
Größe: 9,2 MB

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

EPUB (Adobe DRM)
Größe: 6,4 MB

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Andere Ausgabe

Buch | Softcover (2015)

39,85 €