- Spark Deployment Modes
- Preparing to Install Spark
- Installing Spark in Standalone Mode
- Exploring the Spark Install
- Deploying Spark on Hadoop
- Summary
- Q&A
- Workshop
- Exercises
Preparing to Install Spark
Spark is a cross-platform application that can be deployed on
Linux (all distributions)
Windows
Mac OS X
Although there are no specific hardware requirements, general Spark instance hardware recommendations are
8 GB or more memory
Eight or more CPU cores
10 gigabit or greater network speed
Four or more disks in JBOD configuration (JBOD stands for “Just a Bunch of Disks,” referring to independent hard disks not in a RAID—or Redundant Array of Independent Disks—configuration)
Spark is written in Scala with programming interfaces in Python (PySpark) and Scala. The following are software prerequisites for installing and running Spark:
Java
Python (if you intend to use PySpark)
If you wish to use Spark with R (as I will discuss in Hour 15, “Getting Started with Spark and R”), you will need to install R as well. Git, Maven, or SBT may be useful as well if you intend on building Spark from source or compiling Spark programs.
If you are deploying Spark on YARN or Mesos, of course, you need to have a functioning YARN or Mesos cluster before deploying and configuring Spark to work with these platforms.
I will cover installing Spark in Standalone mode on a single machine on each type of platform, including satisfying all of the dependencies and prerequisites.