Workshop
The workshop contains quiz questions and exercises to help you solidify your understanding of the material covered. Try to answer all questions before looking at the “Answers” section that follows.
Quiz
True or false: A Spark Standalone cluster consists of a single node.
Which component is not a prerequisite for installing Spark?
Scala
Python
Java
Which of the following subdirectories contained in the Spark installation contains scripts to start and stop master and slave node Spark services?
bin
sbin
lib
Which of the following environment variables are required to run Spark on Hadoop/YARN?
HADOOP_CONF_DIR
YARN_CONF_DIR
Either HADOOP_CONF_DIR or YARN_CONF_DIR will work.
Answers
False. Standalone refers to the independent process scheduler for Spark, which could be deployed on a cluster of one-to-many nodes.
A. The Scala assembly is included with Spark; however, Java and Python must exist on the system prior to installation.
B. sbin contains administrative scripts to start and stop Spark services.
C. Either the HADOOP_CONF_DIR or YARN_CONF_DIR environment variable must be set for Spark to use YARN.