Data Wrangling with SQL - Raghav Kandarpa, Shivangi Saxena

Blick ins Buch

Data Wrangling with SQL (eBook)

A hands-on guide to manipulating, wrangling, and engineering data using SQL

Raghav Kandarpa, Shivangi Saxena (Autoren)

eBook Download: EPUB

2023
350 Seiten
Packt Publishing (Verlag)
978-1-83763-430-9 (ISBN)

Lese- und Medienproben

Ebook-Leseprobe (EPUB)

The amount of data generated continues to grow rapidly, making it increasingly important for businesses to be able to wrangle this data and understand it quickly and efficiently. Although data wrangling can be challenging, with the right tools and techniques you can efficiently handle enormous amounts of unstructured data.
The book starts by introducing you to the basics of SQL, focusing on the core principles and techniques of data wrangling. You'll then explore advanced SQL concepts like aggregate functions, window functions, CTEs, and subqueries that are very popular in the business world. The next set of chapters will walk you through different functions within SQL query that cause delays in data transformation and help you figure out the difference between a good query and bad one. You'll also learn how data wrangling and data science go hand in hand. The book is filled with datasets and practical examples to help you understand the concepts thoroughly, along with best practices to guide you at every stage of data wrangling.
By the end of this book, you'll be equipped with essential techniques and best practices for data wrangling, and will predominantly learn how to use clean and standardized data models to make informed decisions, helping businesses avoid costly mistakes.

Become a data wrangling expert and make well-informed decisions by effectively utilizing and analyzing raw unstructured data in a systematic mannerPurchase of the print or Kindle book includes a free PDF eBookKey FeaturesImplement query optimization during data wrangling using the SQL language with practical use casesMaster data cleaning, handle the date function and null value, and write subqueries and window functionsPractice self-assessment questions for SQL-based interviews and real-world case study roundsBook DescriptionThe amount of data generated continues to grow rapidly, making it increasingly important for businesses to be able to wrangle this data and understand it quickly and efficiently. Although data wrangling can be challenging, with the right tools and techniques you can efficiently handle enormous amounts of unstructured data. The book starts by introducing you to the basics of SQL, focusing on the core principles and techniques of data wrangling. You'll then explore advanced SQL concepts like aggregate functions, window functions, CTEs, and subqueries that are very popular in the business world. The next set of chapters will walk you through different functions within SQL query that cause delays in data transformation and help you figure out the difference between a good query and bad one. You'll also learn how data wrangling and data science go hand in hand. The book is filled with datasets and practical examples to help you understand the concepts thoroughly, along with best practices to guide you at every stage of data wrangling. By the end of this book, you'll be equipped with essential techniques and best practices for data wrangling, and will predominantly learn how to use clean and standardized data models to make informed decisions, helping businesses avoid costly mistakes.What you will learnBuild time series models using data wranglingDiscover data wrangling best practices as well as tips and tricksFind out how to use subqueries, window functions, CTEs, and aggregate functionsHandle missing data, data types, date formats, and redundant dataBuild clean and efficient data models using data wrangling techniquesRemove outliers and calculate standard deviation to gauge the skewness of dataWho this book is forThis book is for data analysts looking for effective hands-on methods to manage and analyze large volumes of data using SQL. The book will also benefit data scientists, product managers, and basically any role wherein you are expected to gather data insights and develop business strategies using SQL as a language. If you are new to or have basic knowledge of SQL and databases and an understanding of data cleaning practices, this book will give you further insights into how you can apply SQL concepts to build clean, standardized data models for accurate analysis.]]>

Preface

Part 1: Data Wrangling Introduction

1 Database Introduction

Getting started

Establishing the foundation

Efficient data organization

Data integrity and consistency

Technical requirements

Decoding database structures – relational and non-relational

What is a database?

Types of databases

Tables and relationships

The SQL CREATE DATABASE statement

The SQL CREATE TABLE statement

SQL DROP TABLE versus TRUNCATE TABLE

SQL ALTER TABLE

SQL constraints

SQL keys

Database relationships

Comparing database normalization and denormalization

Normalization

Types of normalization

Denormalization

When to apply denormalization

Disadvantages of denormalization

Summary

Practical exercises

Practical exercise 1

Practical exercise 2

Practical exercise 3

Practical exercise 4

2 Data Profiling and Preparation before Data Wrangling

What is data wrangling?

Data wrangling steps

The importance of data wrangling

Benefits of data wrangling

Data wrangling use cases

Business use cases

Data capture

How does data get captured?

Data-capturing techniques

Web scraping

Structured versus unstructured data

Paid-for versus free data-wrangling tools

Data profiling

Data profiling types

Data profiling techniques

Practical exercise

Step 1 – Discovery

Step 2 – Structuring

Step 3 – Cleaning

Step 4 – Enriching

Step 5 – Validating

Step 6 – Publishing

Summary

Part 2: Data Wrangling Techniques Using SQL

3 Data Wrangling on String Data Types

SQL data types

Numeric data types

Date and time data types

String data type

SQL string functions

RIGHT()

LEFT()

LEN()

TRIM()

RTRIM()

LTRIM()

RPAD()

LPAD()

REPLACE()

REVERSE()

SUBSTRING()

CAST()

CONCATENATE()

CONCATENATE_WS()

UPPER function

LOWER function

INITCAP function

INSTR function

Summary

Practical exercises

Practical exercise 1

Practical exercise 2

Practical exercise 3

Practical exercise 4

4 Data Wrangling on the DATE Data Type

SQL DATE data type functions

EXTRACT

DATEDIFF()

TIMEDIFF()

DATE_ADD()

DATE_SUB()

DATE_FORMAT()

STR_TO_DATE()

Extracting the current date and time

Summary

5 Handling NULL Values

The impact of missing data and NULL values on data analysis

Understanding the importance of data validation and cleaning before analyzing data

Identifying NULL/missing values

NULL values versus zero values

Using the IS NULL and IS NOT NULL operators to filter and select data with NULL values

IS NULL() and IS NOT NULL() – scenario

Using the COALESCE and IFNULL functions to replace NULL values with a default value

IFNULL()

COALESCE()

IS NULL versus = NULL

Summary

6 Pivoting Data Using SQL

SQL Transpose – rows to columns

Use case scenario

SQL Cross Tab – columns to rows

Use case scenario

Unpivoting data in SQL

Analytical workflow – from SQL to business intelligence – transforming data into actionable insights

Summary

Part 3: SQL Subqueries, Aggregate And Window Functions

7 Subqueries and CTEs

Introduction to subqueries

Simple subqueries

Correlated subqueries

Using subqueries in SELECT statements

Using subqueries in FROM statements

Using subqueries in WHERE statements

Nested subqueries

Correlated subqueries

Using subqueries in INSERT, UPDATE, and DELETE statements

Managing and maintaining subqueries

Common table expressions

Performance considerations for subqueries and CTEs

Subquery versus CTEs

Summary

8 Aggregate Functions

Overview of aggregate functions in SQL

Using GROUP BY

COUNT()

SUM()

AVG()

MIN() and MAX()

COUNT(DISTINCT)

Case scenario – using all aggregate functions

Summary

9 SQL Window Functions

The importance of SQL window functions

SQL aggregate functions

SQL window functions versus aggregate functions

Window functions versus aggregate functions – an example to illustrate the differences

Window functions

SUM()

COUNT()

AVG()

ROW_NUMBER()

RANK() and DENSE_RANK()

Lead() and Lag()

NTILE()

Summary

Part 4: Optimizing Query Performance

10 Optimizing Query Performance

Introduction to query optimization

Query execution plan

Query optimization techniques

Example

Caching

Normalization

Query monitoring and troubleshooting

Query profiling

Query logging

Database monitoring

Tips and tricks for writing efficient queries

Summary

In the next chapter, we will learn about descriptive statistics using SQL, which will provide us with insights into the distribution, central tendency, and variability of data, which can, in turn, help us identify outliers and anomalies. Common SQL functions and statements used for descriptive statistics include COUNT, AVG, MIN, MAX, and GROUP BY. By using SQL to analyze data, researchers and analysts can efficiently extract and summarize information from large datasets.

Part 5:Data Science And Wrangling

11 Descriptive Statistics with SQL

Calculating descriptive statistics with SQL

Mean

Median

Mode

Standard deviation

Variance

Variability

Summary

In the next chapter, we will learn how SQL can be used for time series analysis.

12 Time Series with SQL

Running totals

Case scenario

Lead and lag for time series analysis

Case scenario

Key KPIs

Percentage change

Case scenario

Key KPIs

Moving averages

Case scenario

Key KPIs

Rank for time series analysis

Case scenario

Key KPIs

CTE for time series analysis

Importance of using CTEs while performing time series analysis

Forecasting with linear regression

Case scenario

Key KPIs

Summary

In the next chapter, we will learn different methods to find outliers in the data easily. Outlier detection is an important aspect of data analysis as it helps determine if the data is correct, looks at the skewness of the data, and removes any unexpected values.

13 Outlier Detection

Measures of central tendency and dispersion

Case scenario

Key KPIs

Methods for detecting...

Erscheint lt. Verlag	31.7.2023
Sprache	englisch
Themenwelt	Mathematik / Informatik ► Informatik ► Datenbanken
	Mathematik / Informatik ► Informatik ► Programmiersprachen / -werkzeuge
	Mathematik / Informatik ► Informatik ► Theorie / Studium
	Mathematik / Informatik ► Informatik ► Web / Internet
ISBN-10	1-83763-430-0 / 1837634300
ISBN-13	978-1-83763-430-9 / 9781837634309

Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?

EPUB (Ohne DRM)

Digital Rights Management: ohne DRM
Dieses eBook enthält kein DRM oder Kopierschutz. Eine Weitergabe an Dritte ist jedoch rechtlich nicht zulässig, weil Sie beim Kauf nur die Rechte an der persönlichen Nutzung erwerben.

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür die kostenlose Software Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Print-Ausgabe

Buch | Softcover

34,90 €