- Running MapReduce Examples
- Running Basic Hadoop Benchmarks
- Summary and Additional Resources
Summary and Additional Resources
No matter what the size of the Hadoop cluster, confirming and measuring the MapReduce performance of that cluster is an important first step. Hadoop includes some simple applications and benchmarks that can be used for this purpose. The YARN ResourceManager web GUI is a good way to monitor the progress of any application. Jobs that run under the MapReduce framework report a large number of run-time metrics directly (including logs) back to the GUI; these metrics are then presented to the user in a clear and coherent fashion. Should issues arise when running the examples and benchmarks, the mapred job command can be used to kill a MapReduce job.
Additional information and background on each of the examples and benchmarks can be found from the following resources:
Pi Benchmark
Terasort Benchmark
Benchmarking and Stress Testing an Hadoop Cluster
- http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench (uses Hadoop V1, will work with V2)