Apache Spark 2: Data Processing and Real-Time Analytics (eBook)
616 Seiten
Packt Publishing (Verlag)
978-1-78995-991-8 (ISBN)
Build efficient data flow and machine learning programs with this flexible, multi-functional open-source cluster-computing framework
Key Features
- Master the art of real-time big data processing and machine learning
- Explore a wide range of use-cases to analyze large data
- Discover ways to optimize your work by using many features of Spark 2.x and Scala
Book Description
Apache Spark is an in-memory, cluster-based data processing system that provides a wide range of functionalities such as big data processing, analytics, machine learning, and more. With this Learning Path, you can take your knowledge of Apache Spark to the next level by learning how to expand Spark's functionality and building your own data flow and machine learning programs on this platform.
You will work with the different modules in Apache Spark, such as interactive querying with Spark SQL, using DataFrames and datasets, implementing streaming analytics with Spark Streaming, and applying machine learning and deep learning techniques on Spark using MLlib and various external tools.
By the end of this elaborately designed Learning Path, you will have all the knowledge you need to master Apache Spark, and build your own big data processing and analytics pipeline quickly and without any hassle.
This Learning Path includes content from the following Packt products:
- Mastering Apache Spark 2.x by Romeo Kienzler
- Scala and Spark for Big Data Analytics by Md. Rezaul Karim, Sridhar Alla
- Apache Spark 2.x Machine Learning Cookbook by Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen MeiCookbook
What you will learn
- Get to grips with all the features of Apache Spark 2.x
- Perform highly optimized real-time big data processing
- Use ML and DL techniques with Spark MLlib and third-party tools
- Analyze structured and unstructured data using SparkSQL and GraphX
- Understand tuning, debugging, and monitoring of big data applications
- Build scalable and fault-tolerant streaming applications
- Develop scalable recommendation engines
Who this book is for
If you are an intermediate-level Spark developer looking to master the advanced capabilities and use-cases of Apache Spark 2.x, this Learning Path is ideal for you. Big data professionals who want to learn how to integrate and use the features of Apache Spark and build a strong big data pipeline will also find this Learning Path useful. To grasp the concepts explained in this Learning Path, you must know the fundamentals of Apache Spark and Scala.
Romeo Kienzler works as the chief data scientist in the IBM Watson IoT worldwide team, helping clients to apply advanced machine learning at scale on their IoT sensor data. He holds a Master's degree in computer science from the Swiss Federal Institute of Technology, Zurich, with a specialization in information systems, bioinformatics, and applied statistics. Md. Rezaul Karim is a Research Scientist at Fraunhofer FIT, Germany. He is also a PhD candidate at RWTH Aachen University, Aachen, Germany. He has more than 8 years' experience in the area of research and development with a solid understanding of algorithms and data structures in C, C++, Java, Scala, R, and Python. Sridhar Alla is a big data expert helping companies solve complex problems in distributed computing, large scale data science and analytics practice. He holds a bachelor's in computer science from JNTU, India. He loves writing code in Python, Scala, and Java. He also has extensive hands-on knowledge of several Hadoop-based technologies, TensorFlow, NoSQL, IoT, and deep learning. Siamak Amirghodsi (Sammy) is interested in building advanced technical teams, executive management, Spark, Hadoop, big data analytics, AI, deep learning nets, TensorFlow, cognitive models, swarm algorithms, real-time streaming systems, quantum computing, financial risk management, trading signal discovery, econometrics, long-term financial cycles, IoT, blockchain, probabilistic graphical models, cryptography, and NLP. Meenakshi Rajendran is experienced in the end-to-end delivery of data analytics and data science products for leading financial institutions. Meenakshi holds a master's degree in business administration and is a certified PMP with over 13 years of experience in global software delivery environments. Her areas of research and interest are Apache Spark, cloud, regulatory data governance, machine learning, Cassandra, and managing global data teams at scale. Broderick Hall is a hands-on big data analytics expert and holds a master's degree in computer science with 20 years of experience in designing and developing complex enterprise-wide software applications with real-time and regulatory requirements at a global scale. He is a deep learning early adopter and is currently working on a large-scale cloud-based data platform with deep learning net augmentation. Shuen Mei is a big data analytic platforms expert with 15+ years of experience in designing, building, and executing large-scale, enterprise-distributed financial systems with mission-critical low-latency requirements. He is certified in the Apache Spark, Cloudera Big Data platform, including Developer, Admin, and HBase. He is also a certified AWS solutions architect with emphasis on peta-byte range real-time data platform systems.Build efficient data flow and machine learning programs with this flexible, multi-functional open-source cluster-computing frameworkKey FeaturesMaster the art of real-time big data processing and machine learning Explore a wide range of use-cases to analyze large data Discover ways to optimize your work by using many features of Spark 2.x and ScalaBook DescriptionApache Spark is an in-memory, cluster-based data processing system that provides a wide range of functionalities such as big data processing, analytics, machine learning, and more. With this Learning Path, you can take your knowledge of Apache Spark to the next level by learning how to expand Spark's functionality and building your own data flow and machine learning programs on this platform.You will work with the different modules in Apache Spark, such as interactive querying with Spark SQL, using DataFrames and datasets, implementing streaming analytics with Spark Streaming, and applying machine learning and deep learning techniques on Spark using MLlib and various external tools.By the end of this elaborately designed Learning Path, you will have all the knowledge you need to master Apache Spark, and build your own big data processing and analytics pipeline quickly and without any hassle.This Learning Path includes content from the following Packt products:Mastering Apache Spark 2.x by Romeo KienzlerScala and Spark for Big Data Analytics by Md. Rezaul Karim, Sridhar AllaApache Spark 2.x Machine Learning Cookbook by Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen MeiCookbookWhat you will learnGet to grips with all the features of Apache Spark 2.xPerform highly optimized real-time big data processing Use ML and DL techniques with Spark MLlib and third-party toolsAnalyze structured and unstructured data using SparkSQL and GraphXUnderstand tuning, debugging, and monitoring of big data applications Build scalable and fault-tolerant streaming applications Develop scalable recommendation enginesWho this book is forIf you are an intermediate-level Spark developer looking to master the advanced capabilities and use-cases of Apache Spark 2.x, this Learning Path is ideal for you. Big data professionals who want to learn how to integrate and use the features of Apache Spark and build a strong big data pipeline will also find this Learning Path useful. To grasp the concepts explained in this Learning Path, you must know the fundamentals of Apache Spark and Scala.
Erscheint lt. Verlag | 21.12.2018 |
---|---|
Sprache | englisch |
Themenwelt | Sachbuch/Ratgeber ► Freizeit / Hobby ► Sammeln / Sammlerkataloge |
Mathematik / Informatik ► Informatik ► Theorie / Studium | |
Schlagworte | Apache Spark • Big Data • Big Data Analytics • Data processing • machine learning • Scala • Spark MLlib • Stream Analytics |
ISBN-10 | 1-78995-991-8 / 1789959918 |
ISBN-13 | 978-1-78995-991-8 / 9781789959918 |
Haben Sie eine Frage zum Produkt? |
Größe: 24,3 MB
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich