Learn PySpark
Apress (Verlag)
978-1-4842-4960-4 (ISBN)
You'll start by reviewing PySpark fundamentals, such as Spark’s core architecture, and see how to use PySpark for big data processing like data ingestion, cleaning, and transformations techniques. This is followed by building workflows for analyzing streaming data using PySpark and a comparison of various streaming platforms.
You'll then see how to schedule different spark jobs using Airflow with PySpark and book examine tuning machine and deep learning models for real-time predictions. This book concludes with a discussion on graph frames and performing network analysis using graph algorithms in PySpark. All the code presented in the book will be available in Python scripts on Github.
What You'll Learn
Develop pipelines for streaming data processing using PySpark
Build Machine Learning & Deep Learning models using PySpark latest offerings
Use graph analytics using PySpark
Create Sequence Embeddings from Text data
Who This Book is For
Data Scientists, machine learning and deep learning engineers who want to learn and use PySpark for real time analysis on streaming data.
Pramod Singh is currently a Manager (Data Science) at Publicis Sapient and working as data science lead for a project with Mercedes Benz. He has spent the last nine years working on multiple Data projects at SapientRazorfish, Infosys & Tally and has used traditional to advanced machine learning and deep learning techniques in multiple projects using R, Python, Spark and Tensorflow. Pramod has also been a regular speaker at major conferences in India and abroad and is currently authoring a couple of books on Deep Learning and AI techniques. He regularly conducts Data Science meetups at SapientRazorfish and presents webinars on Machine Learning and Artificial Intelligence. He lives in Bangalore with his wife and 2-year-old son. In his spare time, he enjoys coding, reading and watching football.
Chapter 1: Introduction to PySpark.- Chapter 2: Data Processing.- Chapter 3: Spark Structured Streaming.- Chapter 4: Airflow.- Chapter 5: Machine Learning Library (MLlib).- Chapter 6: Supervised Machine Learning.- Chapter 7: Unsupervised Machine Learning.- Chapter 8: Deep Learning Using PySpark.
Erscheinungsdatum | 24.09.2019 |
---|---|
Zusatzinfo | 32 Illustrations, color; 155 Illustrations, black and white; XVIII, 210 p. 187 illus., 32 illus. in color. |
Verlagsort | Berkley |
Sprache | englisch |
Maße | 155 x 235 mm |
Themenwelt | Mathematik / Informatik ► Informatik ► Datenbanken |
Mathematik / Informatik ► Informatik ► Programmiersprachen / -werkzeuge | |
Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik | |
Schlagworte | Airflow • Big Data • Data processing • Deep learning • Graph frames • machine learning • PySpark • Python • Spark • Supervised Machine Learning • unsupervised machine learning |
ISBN-10 | 1-4842-4960-7 / 1484249607 |
ISBN-13 | 978-1-4842-4960-4 / 9781484249604 |
Zustand | Neuware |
Informationen gemäß Produktsicherheitsverordnung (GPSR) | |
Haben Sie eine Frage zum Produkt? |
aus dem Bereich