- Spark Deployment Modes
- Preparing to Install Spark
- Installing Spark in Standalone Mode
- Exploring the Spark Install
- Deploying Spark on Hadoop
- Summary
- Q&A
- Workshop
- Exercises
Exploring the Spark Install
Now that you have Spark up and running, let’s take a closer look at the install and its various components.
If you followed the instructions in the previous section, “Installing Spark in Standalone Mode,” you should be able to browse the contents of $SPARK_HOME.
In Table 3.1, I describe each subdirectory of the Spark installation.
TABLE 3.1 Spark Installation Subdirectories
Directory |
Description |
bin |
Contains all of the commands/scripts to run Spark applications interactively |
conf |
Contains templates for Spark configuration files, which can be used to set Spark |
ec2 |
Contains scripts to deploy Spark nodes and clusters on Amazon Web Services |
lib |
Contains the main assemblies for Spark including the main library |
licenses |
Includes license files covering other included projects such as Scala and JQuery. |
python |
Contains all of the Python libraries required to run PySpark. You will generally not |
sbin |
Contains administrative scripts to start and stop master and slave services |
data |
Contains sample data sets used for testing mllib (which we will discuss in more |
examples |
Contains the source code for all of the examples included in |
R |
Contains the SparkR package and associated libraries and documentation. |