Expert Hadoop Administration: Managing, Tuning, and Securing Spark, YARN, and HDFS

By Sam R. Alapati
Published Nov 29, 2016 by Addison-Wesley Professional. Part of the Addison-Wesley Data & Analytics Series series.

EPUB (Watermarked)

Your Price: $47.99
About Watermarked eBooks

This EPUB will be accessible from your Account page after purchase.

This eBook requires no passwords or activation to read. We customize your eBook by discreetly watermarking it with your name, making it uniquely yours.

Not for Sale

Also available in other formats.

About

Description

Sample Content

Updates

More Information

About

Features

The comprehensive, up-to-date Apache Hadoop 2 administration handbook and reference
The only Hadoop 2 administration book written by a working Hadoop administrator!
Practical examples show how to perform key day-to-day administration tasks and rapidly troubleshoot Hadoop clusters
Demystifies complex Hadoop environments and management concepts, offering expert advice and best-practice recommendations



Description

Copyright 2017
Dimensions: 7" x 9-1/8"
Pages: 848
Edition: 1st

EPUB (Watermarked)
ISBN-10: 0-13-470338-3
ISBN-13: 978-0-13-470338-1

This is the eBook of the printed book and may not include any media, website access codes, or print supplements that may come packaged with the bound book.

The Comprehensive, Up-to-Date Apache Hadoop Administration Handbook and Reference

“Sam Alapati has worked with production Hadoop clusters for six years. His unique depth of experience has enabled him to write the go-to resource for all administrators looking to spec, size, expand, and secure production Hadoop clusters of any size.”

—Paul Dix, Series Editor

In Expert Hadoop® Administration, leading Hadoop administrator Sam R. Alapati brings together authoritative knowledge for creating, configuring, securing, managing, and optimizing production Hadoop clusters in any environment. Drawing on his experience with large-scale Hadoop administration, Alapati integrates action-oriented advice with carefully researched explanations of both problems and solutions. He covers an unmatched range of topics and offers an unparalleled collection of realistic examples.

Alapati demystifies complex Hadoop environments, helping you understand exactly what happens behind the scenes when you administer your cluster. You’ll gain unprecedented insight as you walk through building clusters from scratch and configuring high availability, performance, security, encryption, and other key attributes. The high-value administration skills you learn here will be indispensable no matter what Hadoop distribution you use or what Hadoop applications you run.

Understand Hadoop’s architecture from an administrator’s standpoint
Create simple and fully distributed clusters
Run MapReduce and Spark applications in a Hadoop cluster
Manage and protect Hadoop data and high availability
Work with HDFS commands, file permissions, and storage management
Move data, and use YARN to allocate resources and schedule jobs
Manage job workflows with Oozie and Hue
Secure, monitor, log, and optimize Hadoop
Benchmark and troubleshoot Hadoop



Sample Content

Part I: Introduction to Hadoop—Architecture and Hadoop Clusters
Chapter 1: Introduction to Hadoop and Its Environment
Chapter 2: An Introduction to the Architecture of Hadoop
Chapter 3: Creating and Configuring a Simple Hadoop Cluster
Chapter 4: Planning for and Creating a Fully Distributed Cluster
Part II: Hadoop Application Frameworks
Chapter 5: Running Applications in a Cluster—The MapReduce Framework (and Hive and Pig)
Chapter 6: Running Applications in a Cluster—The Spark Framework
Chapter 7: Running Spark Applications
Part III: Managing and Protecting Hadoop Data and High Availability
Chapter 8: The Role of the NameNode and How HDFS Works
Chapter 9: HDFS Commands, HDFS Permissions and HDFS Storage
Chapter 10: Data Protection, File Formats and Accessing HDFS
Chapter 11: NameNode Operations, High Availability and Federation
Part IV: Moving Data, Allocating Resources, Scheduling Jobs and Security
Chapter 12: Moving Data Into and Out of Hadoop
Chapter 13: Resource Allocation in a Hadoop Cluster
Chapter 14: Working with Oozie to Manage Job Workflows
Chapter 15: Securing Hadoop
Part V: Monitoring, Optimization and Troubleshooting
Chapter 16: Managing Jobs, Using Hue and Performing Routine Tasks
Chapter 17: Monitoring, Metrics and Hadoop Logging
Chapter 18: Tuning the Cluster Resources, Optimizing MapReduce Jobs and Benchmarking
Chapter 19: Configuring and Tuning Apache Spark on YARN
Chapter 20: Optimizing Spark Applications
Chapter 21: Troubleshooting Hadoop—A Sampler
Chapter 22: Installing VirtualBox and Linux and Cloning the Virtual Machines