SKIP THE SHIPPING
Use code NOSHIP during checkout to save 40% on eligible eBooks, now through January 5. Shop now.
Video accessible from your Account page after purchase.
Register your product to gain access to bonus material or receive a coupon.
7.5 Hours of Video Instruction
Conceptual overviews and code-along sessions get you scaling up your data science projects using Spark, Ray, and Python.
Overview
Machine learning is moving from futuristic AI projects to data analysis on your desk. You need to go beyond following along in discussions to coding machine learning tasks. Spark, Ray, and Python for Scalable Data Science LiveLessons show you how to scale machine learning and artificial intelligence projects using Python, Spark, and Ray.
Skill Level
Introduction
Lesson 1: Introduction to Distributed Computing in Python
Topics
1.1 Introduction and Materials
1.2 The Data Science Process
1.3 A Brief Historical Diversion
1.4 Distributed Systems Primer
1.5 Python Distributed Computing Frameworks
1.6 The What and Why of Spark
1.7 The Spark Platform
1.8 Spark versus Ray
Lesson 2: Scaling Data Processing with Spark
Topics
2.1 Course Coding Setup
2.2 Your First PySpark Job
2.3 Introduction to RDDs
2.4 Transformations versus Actions
2.5 RDD Deep Dive
2.6 The Spark Execution Context
2.7 Spark versus Hadoop
2.8 Spark Application Lifecycle
Lesson 3: Exploratory Data Analysis with PySpark
Topics
3.1 Introduction to Exploratory Data Analysis
3.2 A Quick Tour of Jupyter Notebooks
3.3 Parsing Data at Scale
3.4 Spark DataFrames: Integration into Existing Workflows
3.5 Scaling Exploratory Data Analysis with Spark
3.6 Making Sense of Data: Summary Statistics and Data Visualization
3.7 Working with Text: Introduction to NLP
3.8 Tokenization and Vectorization with MLlib
Lesson 4: Parallel Computing with Ray
Topics
4.1 The What and Why of Ray
4.2 The Ray Programming Model
4.3 Parallelizing Functions with Ray Tasks
4.4 Asynchronous Programming with Actors
4.5 Cellular Automata and the Game of Life
4.6 Distributed Agent-Based Models with Ray
Lesson 5: Scaling AI Applications with Ray
Topics
5.1 Introduction to Model Evaluation
5.2 Serializing Data for Machine Learning Applications
5.3 Cross Validation with scikit-learn
5.4 Strategies for Tuning Machine Learning Models
5.5 Grid Search in Python
5.6 Distributed Hyperparameter Optimization with Ray Tune
5.7 Resource Efficient Search with Principled Early Stopping
5.8 Diving Deeper into Ray's Internals
5.9 Serving Machine Learning Models
5.10 Deploying AI Applications with Ray Serve
5.11 Monitoring Model Performance in Production
Summary