Python Data Cleaning and Preparation Best Practices - Maria Zervou

Python Data Cleaning and Preparation Best Practices

A practical guide in Python for organizing and handling data from various sources and formats

(Autor)

Buch | Softcover
306 Seiten
2024
Packt Publishing Limited (Verlag)
978-1-83763-474-3 (ISBN)
47,35 inkl. MwSt
Take your data preparation skills to the next level by converting any type of data asset into a structured, properly formatted, and readily usable dataset

Key Features

Maximize the value of your data with effective data-cleaning methods
Transform your data skills with strategies for handling structured and unstructured data
Learn to elevate the quality of your data products by testing and validating your data pipelines

Book DescriptionData professionals face several challenges in effectively leveraging data in today's data-driven world. One of the main challenges is the low quality of data products, caused by data that is inaccurate, incomplete, or inconsistent. Another significant challenge is the lack of skills among data professionals to analyze unstructured data, missing valuable insights that are difficult or impossible to obtain from structured data alone.

To tackle these challenges, you will go on a journey to the upstream data pipeline, which includes the ingestion of data from various sources, validation and profiling of the data for high-quality end tables, and writing the data to different sinks. Subsequently, you will acquire knowledge on handling structured data by performing essential tasks like cleaning and encoding datasets and handling missing values and outliers. The journey concludes by demystifying the manipulation of unstructured data with simple techniques that unlock their potential. You will be introduced to a variety of natural language processing techniques, from tokenization to vector models, as well as techniques for structuring images, videos, and audio.

By the end of the book, you will have achieved mastery of the techniques of data cleaning and preparation for both structured and unstructured data.What you will learn

Ingest data from different sources and write them to required sinks
Profile and validate data pipelines for better quality control
Master grouping, merging, and joining structured data
Handle missing values and outliers in structured datasets
Implement techniques to manipulate and transform time series data
Apply structure to text, image, voice and other unstructured data

Who this book is forWhether you're a Data Analyst, Data Engineer, Data Scientist, or any data professional who relishes the task of data preparation and cleaning, this book is for you. It’s an ideal resource for upskilling in data cleaning concepts and expanding your knowledge across all types of data, from tabular to audio and video.
Working knowledge of Python programming is needed to get the most out of the book

Maria is a Greek ex-pat experiencing London life! With a Data Science master's degree and a lot of curiosity about how data can unlock new ways of solving problems, her journey to the data world started. Maria has worked as a Data Scientist and an ML engineer for the past 4 years and today, she is working as senior specialist solutions architect at Databricks where she helps businesses leverage data and AI to solve their data challenges at scale. Her passion for new technologies and cloud infrastructure comes as no surprise as well as anything related to automation and simplicity. Maria is also a public speaker advocating how to build Machine learning products at scale and how we can make the data space more inclusive and diverse.

Table of Contents

Data Ingestion Sources
Importance of Data Quality
Data Profiling and Validation
Cleaning of Messy data and Data Manipulation
Data Transformation, Aggregation and Grouping
Data Destination/Sinks
Detecting and Handle Missing Values and Outliers
Feature Scaling and Normalization
Handling Categorical Features
Consume Times Series Data
From Raw Text to Full Tokenization
From clean Tokens to Text Structuring and Vector Models
From Image Preprocessing to Video Handling
Consume Audio Data and Extract Audio from Video

Erscheinungsdatum
Verlagsort Birmingham
Sprache englisch
Maße 191 x 235 mm
Themenwelt Mathematik / Informatik Informatik Programmiersprachen / -werkzeuge
Informatik Software Entwicklung User Interfaces (HCI)
Mathematik / Informatik Informatik Theorie / Studium
ISBN-10 1-83763-474-2 / 1837634742
ISBN-13 978-1-83763-474-3 / 9781837634743
Zustand Neuware
Haben Sie eine Frage zum Produkt?
Mehr entdecken
aus dem Bereich
Aus- und Weiterbildung nach iSAQB-Standard zum Certified Professional …

von Mahbouba Gharbi; Arne Koschel; Andreas Rausch; Gernot Starke

Buch | Hardcover (2023)
dpunkt Verlag
34,90