Python Data Analytics (eBook)
XIX, 569 Seiten
Apress (Verlag)
978-1-4842-3913-1 (ISBN)
- Understand the core concepts of data analysis and the Python ecosystem
- Go in depth with pandas for reading, writing, and processing data
- Use tools and techniques for data visualization and image analysis
- Examine popular deep learning libraries Keras, Theano,TensorFlow, and PyTorch
Fabio Nelli is an IT Scientific Application Specialist at IRBM Science Park, a private research center in Pomezia, Roma, Italy. He has been a computer consultant for many years at IBM, EDS, Merck Sharp, and Dohme, along with several banks and insurance companies. He has an Organic Chemistry degree and many years of experience in Information technologies and Automation systems applied to Life Sciences (Tech Specialist at Beckman Coulter Italy and Spain). He is currently developing Java applications that interface Oracle databases with scientific instrumentation generating data and web server applications providing analysis of the results to researchers in real time.
Explore the latest Python tools and techniques to help you tackle the world of data acquisition and analysis. You'll review scientific computing with NumPy, visualization with matplotlib, and machine learning with scikit-learn. This revision is fully updated with new content on social media data analysis, image analysis with OpenCV, and deep learning libraries. Each chapter includes multiple examples demonstrating how to work with each library. At its heart lies the coverage of pandas, for high-performance, easy-to-use data structures and tools for data manipulationAuthor Fabio Nelli expertly demonstrates using Python for data processing, management, and information retrieval. Later chapters apply what you've learned to handwriting recognition and extending graphical capabilities with the JavaScript D3 library. Whether you are dealing with sales data, investment data, medical data, web page usage, or other data sets, Python Data Analytics, Second Edition is an invaluable reference with its examples of storing, accessing, and analyzing data.What You'll LearnUnderstand the core concepts of data analysis and the Python ecosystemGo in depth with pandas for reading, writing, and processing dataUse tools and techniques for data visualization and image analysisExamine popular deep learning libraries Keras, Theano,TensorFlow, and PyTorchWho This Book Is ForExperienced Python developers who need to learn about Pythonic tools for data analysis
Fabio Nelli is an IT Scientific Application Specialist at IRBM Science Park, a private research center in Pomezia, Roma, Italy. He has been a computer consultant for many years at IBM, EDS, Merck Sharp, and Dohme, along with several banks and insurance companies. He has an Organic Chemistry degree and many years of experience in Information technologies and Automation systems applied to Life Sciences (Tech Specialist at Beckman Coulter Italy and Spain). He is currently developing Java applications that interface Oracle databases with scientific instrumentation generating data and web server applications providing analysis of the results to researchers in real time.
Table of Contents 5
About the Author 17
About the Technical Reviewer 18
Chapter 1: An Introduction to Data Analysis 19
Data Analysis 19
Knowledge Domains of the Data Analyst 21
Computer Science 21
Mathematics and Statistics 22
Machine Learning and Artificial Intelligence 23
Professional Fields of Application 23
Understanding the Nature of the Data 23
When the Data Become Information 24
When the Information Becomes Knowledge 24
Types of Data 24
The Data Analysis Process 24
Problem Definition 26
Data Extraction 27
Data Preparation 28
Data Exploration/Visualization 28
Predictive Modeling 30
Model Validation 31
Deployment 31
Quantitative and Qualitative Data Analysis 32
Open Data 33
Python and Data Analysis 35
Conclusions 35
Chapter 2: Introduction to the Python World 36
Python—The Programming Language 36
Python—The Interpreter 38
Cython 39
Jython 39
PyPy 39
Python 2 and Python 3 40
Installing Python 40
Python Distributions 41
Anaconda 41
Enthought Canopy 43
Python(x,y) 43
Using Python 43
Python Shell 43
Run an Entire Program 44
Implement the Code Using an IDE 45
Interact with Python 45
Writing Python Code 45
Make Calculations 46
Import New Libraries and Functions 46
Data Structure 47
Functional Programming 50
Indentation 51
IPython 52
IPython Shell 53
The Jupyter Project 54
Jupyter QtConsole 55
Jupyter Notebook 56
PyPI—The Python Package Index 56
The IDEs for Python 57
Spyder 58
Eclipse (pyDev) 58
Sublime 59
Liclipse 60
NinjaIDE 61
Komodo IDE 62
SciPy 63
NumPy 64
Pandas 64
matplotlib 65
Conclusions 65
Chapter 3: The NumPy Library 66
NumPy: A Little History 66
The NumPy Installation 67
Ndarray: The Heart of the Library 67
Create an Array 69
Types of Data 70
The dtype Option 71
Intrinsic Creation of an Array 72
Basic Operations 74
Arithmetic Operators 74
The Matrix Product 76
Increment and Decrement Operators 77
Universal Functions (ufunc) 78
Aggregate Functions 79
Indexing, Slicing, and Iterating 79
Indexing 80
Slicing 82
Iterating an Array 84
Conditions and Boolean Arrays 86
Shape Manipulation 87
Array Manipulation 88
Joining Arrays 88
Splitting Arrays 89
General Concepts 91
Copies or Views of Objects 92
Vectorization 93
Broadcasting 93
Structured Arrays 96
Reading and Writing Array Data on Files 99
Loading and Saving Data in Binary Files 99
Reading Files with Tabular Data 100
Conclusions 101
Chapter 4: The pandas Library—An Introduction 103
pandas: The Python Data Analysis Library 103
Installation of pandas 104
Installation from Anaconda 104
Installation from PyPI 105
Installation on Linux 106
Installation from Source 106
A Module Repository for Windows 106
Testing Your pandas Installation 107
Getting Started with pandas 108
Introduction to pandas Data Structures 108
The Series 109
Declaring a Series 110
Selecting the Internal Elements 111
Assigning Values to the Elements 111
Defining a Series from NumPy Arrays and Other Series 112
Filtering Values 113
Operations and Mathematical Functions 113
Evaluating Vales 114
NaN Values 115
Series as Dictionaries 117
Operations Between Series 118
The DataFrame 118
Defining a Dataframe 119
Selecting Elements 121
Assigning Values 123
Membership of a Value 125
Deleting a Column 126
Filtering 126
DataFrame from Nested dict 127
Transposition of a Dataframe 127
The Index Objects 128
Methods on Index 128
Index with Duplicate Labels 128
Other Functionalities on Indexes 130
Reindexing 130
Dropping 133
Arithmetic and Data Alignment 134
Operations Between Data Structures 136
Flexible Arithmetic Methods 136
Operations Between DataFrame and Series 137
Function Application and Mapping 138
Functions by Element 139
Functions by Row or Column 139
Statistics Functions 141
Sorting and Ranking 142
Correlation and Covariance 145
“Not a Number” Data 147
Assigning a NaN Value 147
Filtering Out NaN Values 148
Filling in NaN Occurrences 149
Hierarchical Indexing and Leveling 150
Reordering and Sorting Levels 153
Summary Statistic by Level 154
Conclusions 155
Chapter 5: pandas: Reading and Writing Data 156
I/O API Tools 156
CSV and Textual Files 157
Reading Data in CSV or Text Files 158
Using RegExp to Parse TXT Files 161
Reading TXT Files Into Parts 163
Writing Data in CSV 165
Reading and Writing HTML Files 167
Writing Data in HTML 168
Reading Data from an HTML File 170
Reading Data from XML 172
Reading and Writing Data on Microsoft Excel Files 174
JSON Data 177
The Format HDF5 181
Pickle—Python Object Serialization 183
Serialize a Python Object with cPickle 183
Pickling with pandas 184
Interacting with Databases 185
Loading and Writing Data with SQLite3 186
Loading and Writing Data with PostgreSQL 189
Reading and Writing Data with a NoSQL Database: MongoDB 193
Conclusions 195
Chapter 6: pandas in Depth: Data Manipulation 196
Data Preparation 196
Merging 197
Merging on an Index 202
Concatenating 203
Combining 206
Pivoting 208
Pivoting with Hierarchical Indexing 208
Pivoting from “Long” to “Wide” Format 210
Removing 211
Data Transformation 212
Removing Duplicates 213
Mapping 214
Replacing Values via Mapping 214
Adding Values via Mapping 216
Rename the Indexes of the Axes 217
Discretization and Binning 219
Detecting and Filtering Outliers 224
Permutation 225
Random Sampling 226
String Manipulation 227
Built-in Methods for String Manipulation 227
Regular Expressions 229
Data Aggregation 232
GroupBy 233
A Practical Example 234
Hierarchical Grouping 235
Group Iteration 237
Chain of Transformations 237
Functions on Groups 239
Advanced Data Aggregation 240
Conclusions 244
Chapter 7: Data Visualization with matplotlib 245
The matplotlib Library 245
Installation 247
The IPython and IPython QtConsole 247
The matplotlib Architecture 249
Backend Layer 250
Artist Layer 250
Scripting Layer (pyplot) 252
pylab and pyplot 252
pyplot 253
A Simple Interactive Chart 253
The Plotting Window 255
Set the Properties of the Plot 257
matplotlib and NumPy 260
Using the kwargs 262
Working with Multiple Figures and Axes 263
Adding Elements to the Chart 265
Adding Text 265
Adding a Grid 270
Adding a Legend 271
Saving Your Charts 274
Saving the Code 274
Converting Your Session to an HTML File 276
Saving Your Chart Directly as an Image 278
Handling Date Values 278
Chart Typology 281
Line Charts 281
Line Charts with pandas 290
Histograms 291
Bar Charts 292
Horizontal Bar Charts 295
Multiserial Bar Charts 296
Multiseries Bar Charts with pandas Dataframe 299
Multiseries Stacked Bar Charts 300
Stacked Bar Charts with a pandas Dataframe 304
Other Bar Chart Representations 305
Pie Charts 306
Pie Charts with a pandas Dataframe 310
Advanced Charts 311
Contour Plots 311
Polar Charts 313
The mplot3d Toolkit 316
3D Surfaces 316
Scatter Plots in 3D 318
Bar Charts in 3D 320
Multi-Panel Plots 321
Display Subplots Within Other Subplots 321
Grids of Subplots 323
Conclusions 326
Chapter 8: Machine Learning with scikit-learn 327
The scikit-learn Library 327
Machine Learning 327
Supervised and Unsupervised Learning 328
Supervised Learning 328
Unsupervised Learning 328
Training Set and Testing Set 329
Supervised Learning with scikit-learn 329
The Iris Flower Dataset 330
The PCA Decomposition 334
K-Nearest Neighbors Classifier 336
Diabetes Dataset 341
Linear Regression: The Least Square Regression 342
Support Vector Machines (SVMs) 348
Support Vector Classification (SVC) 348
Nonlinear SVC 353
Plotting Different SVM Classifiers Using the Iris Dataset 356
Support Vector Regression (SVR) 359
Conclusions 361
Chapter 9: Deep Learning with TensorFlow 362
Artificial Intelligence, Machine Learning, and Deep Learning 362
Artificial intelligence 363
Machine Learning Is a Branch of Artificial Intelligence 364
Deep Learning Is a Branch of Machine Learning 364
The Relationship Between Artificial Intelligence, Machine Learning, and Deep Learning 364
Deep Learning 365
Neural Networks and GPUs 365
Data Availability: Open Data Source, Internet of Things, and Big Data 366
Python 367
Deep Learning Python Frameworks 367
Artificial Neural Networks 368
How Artificial Neural Networks Are Structured 368
Single Layer Perceptron (SLP) 370
Multi Layer Perceptron (MLP) 373
Correspondence Between Artificial and Biological Neural Networks 374
TensorFlow 375
TensorFlow: Google’s Framework 375
TensorFlow: Data Flow Graph 375
Start Programming with TensorFlow 376
Installing TensorFlow 376
Programming with the IPython QtConsole 377
The Model and Sessions in TensorFlow 377
Tensors 379
Operation on Tensors 383
Single Layer Perceptron with TensorFlow 384
Before Starting 385
Data To Be Analyzed 385
The SLP Model Definition 387
Learning Phase 391
Test Phase and Accuracy Calculation 396
Multi Layer Perceptron (with One Hidden Layer) with TensorFlow 399
The MLP Model Definition 400
Learning Phase 402
Test Phase and Accuracy Calculation 408
Multi Layer Perceptron (with Two Hidden Layers) with TensorFlow 410
Test Phase and Accuracy Calculation 415
Evaluation of Experimental Data 417
Conclusions 420
Chapter 10: An Example— Meteorological Data 421
A Hypothesis to Be Tested: The Influence of the Proximity of the Sea 421
The System in the Study: The Adriatic Sea and the Po Valley 422
Finding the Data Source 426
Data Analysis on Jupyter Notebook 427
Analysis of Processed Meteorological Data 433
The RoseWind 448
Calculating the Mean Distribution of the Wind Speed 453
Conclusions 455
Chapter 11: Embedding the JavaScript D3 Library in the IPython Notebook 456
The Open Data Source for Demographics 456
The JavaScript D3 Library 460
Drawing a Clustered Bar Chart 465
The Choropleth Maps 470
The Choropleth Map of the U.S. Population in 2014 475
Conclusions 482
Chapter 12: Recognizing Handwritten Digits 483
Handwriting Recognition 483
Recognizing Handwritten Digits with scikit-learn 484
The Digits Dataset 485
Learning and Predicting 488
Recognizing Handwritten Digits with TensorFlow 490
Learning and Predicting 492
Conclusions 496
Chapter 13: Textual Data Analysis with NLTK 497
Text Analysis Techniques 497
The Natural Language Toolkit (NLTK) 498
Import the NLTK Library and the NLTK Downloader Tool 499
Search for a Word with NLTK 503
Analyze the Frequency of Words 504
Selection of Words from Text 507
Bigrams and Collocations 508
Use Text on the Network 510
Extract the Text from the HTML Pages 511
Sentimental Analysis 512
Conclusions 516
Chapter 14: Image Analysis and Computer Vision with OpenCV 517
Image Analysis and Computer Vision 517
OpenCV and Python 518
OpenCV and Deep Learning 519
Installing OpenCV 519
First Approaches to Image Processing and Analysis 519
Before Starting 520
Load and Display an Image 520
Working with Images 522
Save the New Image 524
Elementary Operations on Images 524
Image Blending 530
Image Analysis 531
Edge Detection and Image Gradient Analysis 532
Edge Detection 532
The Image Gradient Theory 533
A Practical Example of Edge Detection with the Image Gradient Analysis 535
A Deep Learning Example: The Face Detection 542
Conclusions 545
Appendix A: Writing Mathematical Expressions with LaTeX 546
With matplotlib 546
With IPython Notebook in a Markdown Cell 546
With IPython Notebook in a Python 2 Cell 547
Subscripts and Superscripts 547
Fractions, Binomials, and Stacked Numbers 547
Radicals 548
Fonts 548
Accents 549
Appendix B: Open Data Sources 557
Political and Government Data 557
Health Data 558
Social Data 558
Miscellaneous and Public Data Sets 559
Financial Data 560
Climatic Data 560
Sports Data 561
Publications, Newspapers, and Books 561
Musical Data 561
Index 562
Erscheint lt. Verlag | 27.9.2018 |
---|---|
Zusatzinfo | XIX, 569 p. 648 illus. |
Verlagsort | Berkeley |
Sprache | englisch |
Themenwelt | Mathematik / Informatik ► Informatik ► Datenbanken |
Informatik ► Programmiersprachen / -werkzeuge ► Python | |
Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik | |
Schlagworte | Data Analysis • Data Visualization • Deep learning • Image Analysis • Keras • machine learning • OpenCV • PyTorch • Social Media Analysis • social network analysis • tensorflow • Text Mining • Theano |
ISBN-10 | 1-4842-3913-X / 148423913X |
ISBN-13 | 978-1-4842-3913-1 / 9781484239131 |
Informationen gemäß Produktsicherheitsverordnung (GPSR) | |
Haben Sie eine Frage zum Produkt? |
Größe: 14,4 MB
DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.
Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich