PySpark SQL Recipes - Raju Kumar Mishra, Sundar Rajan Raman

PySpark SQL Recipes

With HiveQL, Dataframe and Graphframes
Buch | Softcover
323 Seiten
2019 | 1st ed.
Apress (Verlag)
978-1-4842-4334-3 (ISBN)
37,44 inkl. MwSt
Carry out data analysis with PySpark SQL, graphframes, and graph data processing using a problem-solution approach. This book provides solutions to problems related to dataframes, data manipulation summarization, and exploratory analysis. You will improve your skills in graph data analysis using graphframes and see how to optimize your PySpark SQL code.
PySpark SQL Recipes starts with recipes on creating dataframes from different types of data source, data aggregation and summarization, and exploratory data analysis using PySpark SQL. You’ll also discover how to solve problems in graph analysis using graphframes.
On completing this book, you’ll have ready-made code for all your PySpark SQL tasks, including creating dataframes using data from different file formats as well as from SQL or NoSQL databases.
What You Will Learn


Understand PySpark SQL and its advanced features

Use SQL and HiveQL with PySpark SQL

Work with structured streaming

Optimize PySpark SQL 

Master graphframes and graph processing



Who This Book Is ForData scientists, Python programmers, and SQL programmers.

Raju Kumar Mishra has strong interests in data science and systems that have the capability of handling large amounts of data and operating complex mathematical models through computational programming. He was inspired to pursue an M. Tech in computational sciences from Indian Institute of Science in Bangalore, India. Raju primarily works in the areas of data science and its different applications. Working as a corporate trainer he has developed unique insights that help him in teaching and explaining complex ideas with ease. Raju is also a data science consultant solving complex industrial problems. He works on programming tools such as R, Python, scikit-learn, Statsmodels, Hadoop, Hive, Pig, Spark, and many others. His venture Walsoul Private Ltd provides training in data science, programming, and big data. Sundar Rajan Raman is an artificial intelligence practitioner currently working at Bank of America. He holds a Bachelor of Technology degree from the National Institute of Technology, India. Being a seasoned Java and J2EE programmer he has worked on critical applications for companies such as AT&T, Singtel, and Deutsche Bank. He is also a seasoned big data architect. His current focus is on artificial intelligence space including machine learning and deep learning.

Chapter 1:  Introduction to PySparkSQL.- Chapter 2:  Some time with Installation.- Chapter 3:  IO in PySparkSQL.- Chapter 4 :  Operations on PySparkSQL DataFrames.- Chapter 5 :  Data Merging and Data Aggregation using PySparkSQL.- Chapter 6: SQL, NoSQL and PySparkSQL.- Chapter 7: Structured Streaming.- Chapter 8 : Optimizing PySparkSQL.- Chapter 9 : GraphFrames.

Erscheinungsdatum
Zusatzinfo 57 Illustrations, black and white; XXIV, 323 p. 57 illus.
Verlagsort Berkley
Sprache englisch
Maße 155 x 235 mm
Themenwelt Informatik Datenbanken SQL Server
Mathematik / Informatik Informatik Netzwerke
Mathematik / Informatik Informatik Programmiersprachen / -werkzeuge
Informatik Theorie / Studium Compilerbau
ISBN-10 1-4842-4334-X / 148424334X
ISBN-13 978-1-4842-4334-3 / 9781484243343
Zustand Neuware
Haben Sie eine Frage zum Produkt?
Mehr entdecken
aus dem Bereich
Der schnelle Einstieg

von Klemens Konopasek

Buch (2018)
Carl Hanser (Verlag)
40,00
Das umfassende Handbuch. Inkl. Analysis und Reporting Services

von Daniel Caesar; Michael R. Friebel; Hans Georg Selent-Knips

Buch | Hardcover (2020)
Rheinwerk (Verlag)
49,90