Spark in Action

Petar Zečević, Marko Bonaći (Autoren)

Buch | Softcover

472 Seiten

2016
Manning Publications (Verlag)
978-1-61729-260-6 (ISBN)

Artikel merken

Spark in Action teaches you the theory and skills you need to effectively handle batch and streaming data using Spark. Fully updated for Spark 2.0.

Apache Spark is a big data processing framework perfect for analyzing near-real-time streams and discovering historical patterns in batched data sets. Spark also offers machine learning and graph processing capabilities.
Working with big data can be complex and challenging, in part because of the multiple analysis frameworks and tools required.

But Spark goes much further than other frameworks. By including machine learning and graph processing capabilities, it makes many specialized data processing platforms obsolete. Spark's unified framework and programming model significantly lowers the initial infrastructure investment, and Spark's core abstractions are intuitive for most Scala, Java, and Python developers.

Spark in Action teaches readers to use Spark for stream and batch data processing. It starts with an introduction to the Spark architecture and ecosystem followed by a taste of Spark's command line interface.

Readers then discover the most fundamental concepts and abstractions of Spark, particularly Resilient Distributed Datasets (RDDs) and the basic data transformations that RDDs provide.

The first part of the book covers writing Spark applications using the the core APIs.

Readers also learn how to work with structured data using Spark SQL, how to process near-real time data with Spark Streaming, how to apply machine learning algorithms with Spark MLlib, how to apply graph algorithms on graph-shaped data using Spark GraphX, and an introduction to Spark clustering.
* Clear introduction to Spark
* Teaches how to ingest near real-time data
* Gaining value from big data
* Includes real-life case studies

Readers should be familiar with Java, Scala, or Python. No knowledge of Spark or streaming operations is assumed, but some acquaintance with machine learning is helpful.

Petar Zečević is a CTO at SV Group. During the last 14 years he has worked on various projects as a Java developer, team leader, consultant and software specialist. He is the founder and, with Marko, organizer of popular Spark@Zg meetup group.

Marko Bonaći has worked with Java for 13 years. He works Sematext as a Spark developer and consultant. Before that, he was team lead for SV Group's IBM Enterprise Content Management team.

Dig in and get your hands dirty with one of the hottest data processing engines today. A great guide. Jonathan Sharley, Pandora Media

Must-have! Speed up your learning of Spark as a distributed computing framework. Robert Ormandi, Yahoo!

An ambitiously comprehensive overview of Spark and its diverse ecosystem. Jonathan Miller, Optensity

An easy-to-follow, step-by-step guide. Gaurav Bhardwaj, 3Pillar Global

Erscheinungsdatum	18.09.2016
Verlagsort	New York
Sprache	englisch
Gewicht	771 g
Einbandart	kartoniert
Themenwelt	Informatik ► Datenbanken ► Data Warehouse / Data Mining
	Informatik ► Theorie / Studium ► Algorithmen
	Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik
	Mathematik / Informatik ► Informatik ► Web / Internet
Schlagworte	Apache • Big Data • Datenströme Analyse • Spark
ISBN-10	1-61729-260-5 / 1617292605
ISBN-13	978-1-61729-260-6 / 9781617292606
Zustand	Neuware