Beginning Azure Synapse Analytics - Bhadresh Shiyal

Beginning Azure Synapse Analytics (eBook)

Transition from Data Warehouse to Data Lakehouse

(Autor)

eBook Download: PDF
2021 | 1st ed.
XXII, 249 Seiten
Apress (Verlag)
978-1-4842-7061-5 (ISBN)
Systemvoraussetzungen
66,99 inkl. MwSt
  • Download sofort lieferbar
  • Zahlungsarten anzeigen
Get started with Azure Synapse Analytics, Microsoft's modern data analytics platform. This book covers core components such as Synapse SQL, Synapse Spark, Synapse Pipelines, and many more, along with their architecture and implementation.

The book begins with an introduction to core data and analytics concepts followed by an understanding of traditional/legacy data warehouse, modern data warehouse, and the most modern data lakehouse. You will go through the introduction and background of Azure Synapse Analytics along with its main features and key service capabilities. Core architecture is discussed, along with Synapse SQL. You will learn its main features and how to create a dedicated Synapse SQL pool and analyze your big data using Serverless Synapse SQL Pool. You also will learn Synapse Spark and Synapse Pipelines, with examples. And you will learn Synapse Workspace and Synapse Studio followed by Synapse Link and its features. You will go through use cases in Azure Synapse and understand the reference architecture for Synapse Analytics.

After reading this book, you will be able to work with Azure Synapse Analytics and understand its architecture, main components, features, and capabilities.


What You Will Learn
  • Understand core data and analytics concepts and data lakehouse concepts
  • Be familiar with overall Azure Synapse architecture and its main components
  • Be familiar with Synapse SQL and Synapse Spark architecture components
  • Work with integrated Apache Spark (aka Synapse Spark) and Synapse SQL engines
  • Understand Synapse Workspace, Synapse Studio, and Synapse Pipeline
  • Study reference architecture and use cases


Who This Book Is For

Azure data analysts, data engineers, data scientists, and solutions architects


 


Bhadresh Shiyal is an Azure data architect and Azure data engineer. For the past seven years, he has been working with a large multi-national IT corporation as Solutions Architect. Prior to that, he spent almost a decade in private and public sector banks in India in various IT positions working on various Microsoft technologies. He has 18 years of IT experience, including working for two years on an international assignment from London. He has much experience in application design, development, and deployment.

He has worked on various technologies, including Visual Basic, SQL Server, SharePoint Technologies, .NET MVC, O365, Azure Data Factory, Azure Databricks, Azure Synapse Analytics, Azure Data Lake Storage Gen1/Gen2, Azure SQL Data Warehouse, Power BI, Spark SQL, Scala, Delta Lake, Azure Machine Learning, Azure Information Protection, Azure .NET SDK, Azure DevOps, and more.

He holds multiple Azure certifications that include Microsoft Certified Azure Solutions Architect Expert, Microsoft Certified Azure Data Engineer Associate, Microsoft Certified Azure Data Scientist Associate, and Microsoft Certified Azure Data Analyst Associate. 

Bhadresh has worked as Solutions Architect on a large-scale Azure Data Lake implementation project as well as on a data transformation project and on large-scale customized content management systems. He has also worked as Technical Reviewer for the book Data Science using Azure, prior to authoring this book.



Get started with Azure Synapse Analytics, Microsoft's modern data analytics platform. This book covers core components such as Synapse SQL, Synapse Spark, Synapse Pipelines, and many more, along with their architecture and implementation.The book begins with an introduction to core data and analytics concepts followed by an understanding of traditional/legacy data warehouse, modern data warehouse, and the most modern data lakehouse. You will go through the introduction and background of Azure Synapse Analytics along with its main features and key service capabilities. Core architecture is discussed, along with Synapse SQL. You will learn its main features and how to create a dedicated Synapse SQL pool and analyze your big data using Serverless Synapse SQL Pool. You also will learn Synapse Spark and Synapse Pipelines, with examples. And you will learn Synapse Workspace and Synapse Studio followed by Synapse Link and its features. You will go through use cases in Azure Synapse and understand the reference architecture for Synapse Analytics.After reading this book, you will be able to work with Azure Synapse Analytics and understand its architecture, main components, features, and capabilities.What You Will LearnUnderstand core data and analytics concepts and data lakehouse conceptsBe familiar with overall Azure Synapse architecture and its main componentsBe familiar with Synapse SQL and Synapse Spark architecture componentsWork with integrated Apache Spark (aka Synapse Spark) and Synapse SQL enginesUnderstand Synapse Workspace, Synapse Studio, and Synapse PipelineStudy reference architecture and use casesWho This Book Is ForAzure data analysts, data engineers, data scientists, and solutions architects 

Table of Contents 5
About the Author 14
About the Technical Reviewer 15
Acknowledgments 16
Introduction 17
Chapter 1: Core Data and Analytics Concepts 19
Core Data Concepts 19
What Is Data? 20
Structured Data 20
Semi-structured Data 21
Unstructured Data 21
Data Processing Methods 22
Batch Data Processing 22
Streaming or Real-Time Data Processing 23
Relational Data and Its Characteristics 24
Non-Relational Data and Its Characteristics 26
Core Data Analytics Concepts 28
What Is Data Analytics? 28
Data Ingestion 28
Data Exploration 29
Data Processing 30
ETL 30
ELT 31
ELT / ETL Tools 32
Data Visualization 32
Data Analytics Categories 33
Descriptive Analytics 34
Diagnostic Analytics 34
Predictive Analytics 35
Prescriptive Analytics 35
Cognitive Analytics 36
Summary 36
Chapter 2: Modern Data Warehouses and Data Lakehouses 38
What Is a Data Warehouse? 39
Core Data Warehouse Concepts 40
Data Model 40
Model Types 41
Schema Types 41
Metadata 42
Why Do We Need a Data Warehouse? 42
Efficient Decision-Making 42
Separation of Concerns 42
Single Version of the Truth 43
Data Restructuring 43
Self-Service BI 43
Historical Data 44
Security 44
Data Quality 44
Data Mining 45
More Revenues 45
What Is a Modern Data Warehouse? 45
Difference Between Traditional & Modern Data Warehouses
Cloud vs. On-Premises 46
Separation of Compute and Storage Resources 46
Cost 47
Scalability 47
ETL vs. ELT 48
Disaster Recovery 48
Overall Architecture 48
Data Lakehouse 49
What Is a Data Lake? 49
What Is Delta Lake? 50
What Is Apache Spark? 51
What Is a Data Lakehouse? 52
Characteristics of a Data Lakehouse 53
Various Data Types 53
AI 53
Decoupled Compute and Storage Resources 54
Open Source Storage Format 54
Data Analytics and BI Tools 54
ACID Properties 54
Differences Between a Data Warehouse and a Data Lakehouse 55
Architecture 55
Access to Raw Data 55
Open Source vs. Proprietary 56
Workloads 56
Query Engines 56
Data Processing 57
Real-Time Data 57
Examples of Data Lakehouses 58
Azure Synapse Analytics 58
Databricks 59
Benefits of Data Lakehouse 60
Support for All Types of Data 60
Time to Market 61
More Cost Effective 61
AI 61
Reduction in ETL/ELT Jobs 62
Usage of Open Source Tools and Technologies 62
Efficient and Easy Data Governance 62
Drawbacks of Data Lakehouse 63
Monolithic Architecture 63
Technical Infancy 63
Migration Cost 64
Lack of Many Products/Options 64
Scarcity of Skilled Technical Resources 64
Summary 65
Chapter 3: Introduction to Azure Synapse Analytics 66
What Is Azure Synapse Analytics? 66
Azure Synapse Analytics vs. Azure SQL Data Warehouse 68
Why Should You Learn Azure Synapse Analytics? 69
Main Features of Azure Synapse Analytics 70
Unified Data Analytics Experience 70
Powerful Data Insights 71
Unlimited Scale 72
Security, Privacy, and Compliance 72
HTAP 73
Key Service Capabilities of Azure Synapse Analytics 73
Data Lake Exploration 74
Multiple Language Support 75
Deeply Integrated Apache Spark 76
Serverless Synapse SQL Pool 77
Hybrid Data Integration 78
Power BI Integration 79
AI Integration 80
Enterprise Data Warehousing 81
Seamless Streaming Analytics 82
Workload Management 82
Advanced Security 84
Summary 85
Chapter 4: Architecture and Its Main Components 86
High-Level Architecture 87
Main Components of Architecture 90
Synapse SQL 90
Compute Layer 90
Dedicated Synapse SQL Pool 90
Serverless Synapse SQL Pool 91
Storage Layer 93
Synapse Spark or Apache Spark 94
Synapse Pipelines 96
Synapse Studio 98
Synapse Link 100
Summary 102
Chapter 5: Synapse SQL 104
Synapse SQL Architecture Components 105
Massively Parallel Processing Engine 106
Distributed Query Processing Engine 107
Control Node 107
Compute Nodes 108
Data Movement Service 109
Distribution 109
Hash Distribution 111
Round-Robin Distribution 112
Replication-based Distribution 112
Azure Storage 114
Dedicated or Provisioned Synapse SQL Pool 114
Serverless or On-Demand Synapse SQL Pool 116
Synapse SQL Feature Comparison 117
Database Object Types 117
Query Language 119
Security 120
Tools 123
Storage Options 124
Data Formats 125
Resource Consumption Model for Synapse SQL 125
Synapse SQL Best Practices 126
Best Practices for Serverless Synapse SQL Pool 127
Best Practices for Dedicated Synapse SQL Pool 128
How-To’s 129
Create a Dedicated Synapse SQL Pool 129
Create a Serverless or On-Demand Synapse SQL Pool 132
Load Data Using COPY Statement in Dedicated Synapse SQL Pool 132
Ingest Data into Azure Data Lake Storage Gen2 133
Summary 134
Chapter 6: Synapse Spark 136
What Is Apache Spark? 137
What Is Synapse Spark in Azure Synapse Analytics? 139
Synapse Spark Features & Capabilities
Speed 140
Faster Start Time 140
Ease of Creation 140
Ease of Use 141
Security 141
Automatic Scalability 141
Separation of Concerns 142
Multiple Language Support 142
Integration with IDEs 142
Pre-loaded Libraries 143
REST APIs 143
Delta Lake and Its Importance in Synapse Spark 144
Synapse Spark Job Optimization 145
Data Format 145
Memory Management 146
Data Serialization 146
Data Caching 147
Data Abstraction 147
Join and Shuffle Optimization 148
Bucketing 149
Hyperspace Indexing 149
Synapse Spark Machine Learning 149
Data Preparation and Exploration 150
Build Machine Learning Models 150
Train Machine Learning Models 150
Model Deployment and Scoring 151
How-To’s 151
How to Create a Synapse Spark Pool 151
How to Create and Submit Apache Spark Job Definition in Synapse Studio Using Python 157
How to Monitor Synapse Spark Pools Using Synapse Studio 163
Summary 166
Chapter 7: Synapse Pipelines 168
Overview of Azure Data Factory 169
Overview of Synapse Pipelines 171
Activities 172
Pipelines 173
Linked Services 173
Dataset 174
Integration Runtimes (IR) 175
Azure Integration Runtime (Azure IR) 175
Self-Hosted Integration Runtimes (SHIR) 176
Azure SSIS Integration Runtimes (Azure SSIS IR) 177
Control Flow 177
Parameters 178
Data Flow 178
Data Movement Activities 178
Category: Azure 179
Category: Database 180
Category: NoSQL 181
Category: File 181
Category: Generic 182
Category: Services and Applications 182
Data Transformation Activities 184
Control Flow Activities 185
Copy Pipeline Example 186
Transformation Pipeline Example 188
Pipeline Triggers 189
Summary 190
Chapter 8: Synapse Workspace and Studio 192
What Is a Synapse Analytics Workspace? 193
Synapse Analytics Workspace Components and Features 194
Azure Data Lake Storage Gen2 Account and File System 194
Serverless Synapse SQL Pool 195
Shared Metadata Management 195
Code Artifacts 196
What Is Synapse Studio? 197
Main Features of Synapse Studio 199
Home Hub 199
Data Hub 199
Develop Hub 200
Integrate Hub 201
Monitor Hub 202
Integration 203
Activities 204
Manage Hub 204
Analytics Pools 204
External Connections 205
Integration 205
Security 206
Synapse Studio Capabilities 206
Data Preparation 206
Data Management 207
Data Exploration 207
Data Warehousing 207
Data Visualization 208
Machine Learning 208
Power BI in Synapse Studio 209
How-To’s 210
How to Create or Provision a New Azure Synapse Analytics Workspace Using Azure Portal 210
How to Launch Azure Synapse Studio 212
How to Link Power BI with Azure Synapse Studio 213
Summary 215
Chapter 9: Synapse Link 217
OLTP vs. OLAP 218
What Is HTAP? 219
Benefits of HTAP 219
No-ETL Analytics 219
Instant Insights 220
Reduced Data Duplication 220
Simplified Technical Architecture 220
What Is Azure Synapse Link? 221
Azure Cosmos DB 222
Azure Cosmos DB Analytical Store 222
Columnar Storage 224
Decoupling of Operational Store 224
Automatic Data Synchronization 225
SQL API and MongoDB API 225
Analytical TTL 225
Automatic Schema Updates 226
Cost-Effective Archiving 226
Scalability 227
When to Use Azure Synapse Link for Cosmos DB 227
Azure Synapse Link Limitations 228
Azure Synapse Link Use Cases 229
Industrial IOT 230
Predictive Maintenance Pipeline 231
Operational Reporting 231
Real-Time Applications 232
Real-Time Personalization for E-Commerce Users 232
How-To’s 233
How to Enable Azure Synapse Link for Azure Cosmos DB 233
How to Create an Azure Cosmos DB Container with Analytical Store Using Azure Portal 235
How to Connect to Azure Synapse Link for Azure Cosmos DB Using Azure Portal 236
Summary 237
Chapter 10: Azure Synapse Analytics Use Cases and Reference Architecture 240
Where Should You Use Azure Synapse Analytics? 241
Large Volume of Data 241
Disparate Sources of Data 241
Data Transformation 241
Batch or Streaming Data 242
Where Should You Not Use Azure Synapse Analytics? 242
Use Cases for Azure Synapse Analytics 243
Financial Services 243
Manufacturing 244
Retail 245
Healthcare 245
Reference Architectures for Azure Synapse Analytics 246
Modern Data Warehouse Architecture 246
Real-Time Analytics on Big Data Architecture 251
Summary 254
Index 257

Erscheint lt. Verlag 16.6.2021
Zusatzinfo XXII, 249 p. 66 illus.
Sprache englisch
Themenwelt Informatik Datenbanken Data Warehouse / Data Mining
Mathematik / Informatik Informatik Software Entwicklung
Schlagworte Azure • Azure Data Analytics • azure databricks • Azure Data Engineering • Azure SQL Datawarehouse • Data Lakehouse • Data Visualization • Microsoft • Modern Data Warehouse • SQL • Synapse SQL
ISBN-10 1-4842-7061-4 / 1484270614
ISBN-13 978-1-4842-7061-5 / 9781484270615
Haben Sie eine Frage zum Produkt?
PDFPDF (Wasserzeichen)
Größe: 6,3 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasser­zeichen und ist damit für Sie persona­lisiert. Bei einer missbräuch­lichen Weiter­gabe des eBooks an Dritte ist eine Rück­ver­folgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seiten­layout eignet sich die PDF besonders für Fach­bücher mit Spalten, Tabellen und Abbild­ungen. Eine PDF kann auf fast allen Geräten ange­zeigt werden, ist aber für kleine Displays (Smart­phone, eReader) nur einge­schränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich
Datenschutz und Sicherheit in Daten- und KI-Projekten

von Katharine Jarmul

eBook Download (2024)
O'Reilly Verlag
24,99