Home > Articles

Installing Spark

Big data consultant Jeffrey Aven covers the basics about how Spark is deployed and how to install Spark. He also covers how to deploy Spark on Hadoop using the Hadoop scheduler, YARN.

This chapter is from the book

Now that you’ve gotten through the heavy stuff in the last two hours, you can dive headfirst into Spark and get your hands dirty, so to speak.

This hour covers the basics about how Spark is deployed and how to install Spark. I will also cover how to deploy Spark on Hadoop using the Hadoop scheduler, YARN, discussed in Hour 2.

By the end of this hour, you’ll be up and running with an installation of Spark that you will use in subsequent hours.

Spark Deployment Modes

There are three primary deployment modes for Spark:

  • Spark Standalone

  • Spark on YARN (Hadoop)

  • Spark on Mesos

Spark Standalone refers to the built-in or “standalone” scheduler. The term can be confusing because you can have a single machine or a multinode fully distributed cluster both running in Spark Standalone mode. The term “standalone” simply means it does not need an external scheduler.

With Spark Standalone, you can get up an running quickly with few dependencies or environmental considerations. Spark Standalone includes everything you need to get started.

Spark on YARN and Spark on Mesos are deployment modes that use the resource schedulers YARN and Mesos respectively. In each case, you would need to establish a working YARN or Mesos cluster prior to installing and configuring Spark. In the case of Spark on YARN, this typically involves deploying Spark to an existing Hadoop cluster.

I will cover Spark Standalone and Spark on YARN installation examples in this hour because these are the most common deployment modes in use today.

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.