- Spark Deployment Modes
- Preparing to Install Spark
- Installing Spark in Standalone Mode
- Exploring the Spark Install
- Deploying Spark on Hadoop
- Summary
- Q&A
- Workshop
- Exercises
Installing Spark in Standalone Mode
In this section I will cover deploying Spark in Standalone mode on a single machine using various platforms. Feel free to choose the platform that is most relevant to you to install Spark on.
Getting Spark
In the installation steps for Linux and Mac OS X, I will use pre-built releases of Spark. You could also download the source code for Spark and build it yourself for your target platform using the build instructions provided on the official Spark website. I will use the latest Spark binary release in my examples. In either case, your first step, regardless of the intended installation platform, is to download either the release or source from: http://spark.apache.org/downloads.html
This page will allow you to download the latest release of Spark. In this example, the latest release is 1.5.2, your release will likely be greater than this (e.g. 1.6.x or 2.x.x).
FIGURE 3.1 The Apache Spark downloads page.
Installing a Multi-node Spark Standalone Cluster
Using the steps outlined in this section for your preferred target platform, you will have installed a single node Spark Standalone cluster. I will discuss Spark’s cluster architecture in more detail in Hour 4, “Understanding the Spark Runtime Architecture.” However, to create a multi-node cluster from a single node system, you would need to do the following:
Ensure all cluster nodes can resolve hostnames of other cluster members and are routable to one another (typically, nodes are on the same private subnet).
Enable passwordless SSH (Secure Shell) for the Spark master to the Spark slaves (this step is only required to enable remote login for the slave daemon startup and shutdown actions).
Configure the spark-defaults.conf file on all nodes with the URL of the Spark master node.
Configure the spark-env.sh file on all nodes with the hostname or IP address of the Spark master node.
Run the start-master.sh script from the sbin directory on the Spark master node.
Run the start-slave.sh script from the sbin directory on all of the Spark slave nodes.
Check the Spark master UI. You should see each slave node in the Workers section.
Run a test Spark job.