There is a dearth of resources for data scientists, statisticians, etc., wishing to learn about Julia. Using well known data science methods, this book will both motivate the reader and assuage any unease. The book will get readers up to speed on key features of the Julia language and illustrate some of its advantages for data science work.
Chapter 1
Introduction
DATA SCIENCE
BIG DATA
JULIA
JULIA PACKAGES
R PACKAGES
DATASETS
Overview
Beer Data
Coffee Data
Leptograpsus Crabs Data
Food Preferences Data
x Data
Iris Data
OUTLINE OF THE CONTENTS OF THIS MONOGRAPH
Chapter 2
Core Julia
VARIABLE NAMES
TYPES
Numeric
Floats
Strings
Tuples
DATA STRUCTURES
Arrays
Dictionaries
CONTROL FLOW
Compound Expressions
Conditional Evaluation
Loops
Basics
Loop termination
Exception Handling
FUNCTIONS
Chapter 3
Working With Data
DATAFRAMES
CATEGORICAL DATA
IO
USEFUL DATAFRAME FUNCTIONS
SPLIT-APPLY-COMBINE STRATEGY
QUERYJL
Chapter 4
Visualizing Data
GADFLYJL
VISUALIZING UNIVARIATE DATA
DISTRIBUTIONS
VISUALIZING BIVARIATE DATA
ERROR BARS
FACETS
SAVING PLOTS
Chapter 5
Supervised Learning
INTRODUCTION
Contents _ ix
CROSS-VALIDATION
Overview
K-Fold Cross-Validation
K-NEAREST NEIGHBOURS CLASSIFICATION
CLASSIFICATION AND REGRESSION TREES
Overview
Classification Trees
Regression Trees
Comments
BOOTSTRAP
RANDOM FORESTS
GRADIENT BOOSTING
Overview
Beer Data
Food Data
COMMENTS
Chapter 6
Unsupervised Learning
INTRODUCTION
PRINCIPAL COMPONENTS ANALYSIS
PROBABILISTIC PRINCIPAL COMPONENTS
ANALYSIS
EM ALGORITHM FOR PPCA
Background: EM Algorithm
E-step
M-step
Woodbury Identity
Initialization
Stopping Rule
Implementing the EM Algorithm for PPCA
Comments
K-MEANS CLUSTERING
MIXTURE OF PPCAS
Model
Parameter Estimation
Illustrative Example: Coffee Data
Chapter 7
R Interoperability
ACCESSING R DATASETS
INTERACTING WITH R
EXAMPLE: CLUSTERING AND DATA REDUCTION FOR THE COFFEE DATA
Coffee Data
PGMM Analysis
VSCC Analysis
EXAMPLE: FOOD DATA
Overview
Random Forests