Data Science with Julia

Paul D. McNicholas, Peter Tait (Autoren)

Buch | Softcover

240 Seiten

2019
CRC Press (Verlag)
978-1-138-49998-0 (ISBN)

Artikel merken

There is a dearth of resources for data scientists, statisticians, etc., wishing to learn about Julia. Using well known data science methods, this book will both motivate the reader and assuage any unease. The book will get readers up to speed on key features of the Julia language and illustrate some of its advantages for data science work.

"This book is a great way to both start learning data science through the promising Julia language and to become an efficient data scientist."- Professor Charles Bouveyron, INRIA Chair in Data Science, Université Côte d’Azur, Nice, France

Julia, an open-source programming language, was created to be as easy to use as languages such as R and Python while also as fast as C and Fortran. An accessible, intuitive, and highly efficient base language with speed that exceeds R and Python, makes Julia a formidable language for data science. Using well known data science methods that will motivate the reader, Data Science with Julia will get readers up to speed on key features of the Julia language and illustrate its facilities for data science and machine learning work.

Features:

Covers the core components of Julia as well as packages relevant to the input, manipulation and representation of data.

Discusses several important topics in data science including supervised and unsupervised learning.

Reviews data visualization using the Gadfly package, which was designed to emulate the very popular ggplot2 package in R. Readers will learn how to make many common plots and how to visualize model results.

Presents how to optimize Julia code for performance.

Will be an ideal source for people who already know R and want to learn how to use Julia (though no previous knowledge of R or any other programming language is required).

The advantages of Julia for data science cannot be understated. Besides speed and ease of use, there are already over 1,900 packages available and Julia can interface (either directly or through packages) with libraries written in R, Python, Matlab, C, C++ or Fortran. The book is for senior undergraduates, beginning graduate students, or practicing data scientists who want to learn how to use Julia for data science.

"This book is a great way to both start learning data science through the promising Julia language and to become an efficient data scientist."

Professor Charles Bouveyron
INRIA Chair in Data Science
Université Côte d’Azur, Nice, France

Paul D. McNicholas is the Canada Research Chair in Computational Statistics at McMaster University, where he is a Professor in the Department of Mathematics and Statistics. Peter Tait is a Ph.D. student at the Department of Mathematics and Statistics at McMaster University. Prior to returning to academia, he worked as a data scientist in the software industry, where he gained extensive practical experience.

Chapter 1
Introduction

DATA SCIENCE

BIG DATA

JULIA

JULIA PACKAGES

R PACKAGES

DATASETS

Overview

Beer Data

Coffee Data

Leptograpsus Crabs Data

Food Preferences Data

x Data

Iris Data

OUTLINE OF THE CONTENTS OF THIS

MONOGRAPH

Chapter 2
Core Julia

VARIABLE NAMES

TYPES

Numeric

Floats

Strings

Tuples

DATA STRUCTURES

Arrays

Dictionaries

CONTROL FLOW

Compound Expressions

Conditional Evaluation

Loops

Basics

Loop termination

Exception Handling

FUNCTIONS

Chapter 3
Working with Data

DATAFRAMES

CATEGORICAL DATA

IO

USEFUL DATAFRAME FUNCTIONS

SPLIT-APPLY-COMBINE STRATEGY

QUERYJL

Chapter 4
Visualizing Data

GADFLYJL

VISUALIZING UNIVARIATE DATA

DISTRIBUTIONS

VISUALIZING BIVARIATE DATA

ERROR BARS

FACETS

SAVING PLOTS

Chapter 5
Supervised Learning

INTRODUCTION

CROSS-VALIDATION

Overview

K-Fold Cross-Validation

K-NEAREST NEIGHBOURS CLASSIFICATION

CLASSIFICATION AND REGRESSION TREES

Overview

Classification Trees

Regression Trees

Comments

BOOTSTRAP

RANDOM FORESTS

GRADIENT BOOSTING

Overview

Beer Data

Food Data

COMMENTS

Chapter 6
Unsupervised Learning

INTRODUCTION

PRINCIPAL COMPONENTS ANALYSIS

PROBABILISTIC PRINCIPAL COMPONENTS

ANALYSIS

EM ALGORITHM FOR PPCA

Background: EM Algorithm

E-step

M-step

Woodbury Identity

Initialization

Stopping Rule

Implementing the EM Algorithm for

PPCA

K-MEANS CLUSTERING

MIXTURE OF PPCAS

Model

Parameter Estimation

Illustrative Example: Coffee Data

Chapter 7
R Interoperability

ACCESSING R DATASETS

INTERACTING WITH R

EXAMPLE: CLUSTERING AND DATA REDUCTION

FOR THE COFFEE DATA

Coffee Data

PGMM Analysis

VSCC Analysis

EXAMPLE: FOOD DATA

Overview

Random Forests

Erscheinungsdatum	20.12.2018
Verlagsort	London
Sprache	englisch
Maße	138 x 216 mm
Gewicht	336 g
Themenwelt	Mathematik / Informatik ► Informatik ► Datenbanken
	Mathematik / Informatik ► Informatik ► Programmiersprachen / -werkzeuge
	Mathematik / Informatik ► Mathematik ► Computerprogramme / Computeralgebra
	Mathematik / Informatik ► Mathematik ► Wahrscheinlichkeit / Kombinatorik
ISBN-10	1-138-49998-6 / 1138499986
ISBN-13	978-1-138-49998-0 / 9781138499980
Zustand	Neuware