WebHDFS
As long as an application needs to access data stored in HDFS from inside a cluster or another machine on the network, it can use a high-performance native protocol or native Java API and be fine. But what if an external application wants to access or manage files in the HDFS over the Internet or HTTP or the Web?
For these kinds of requirements, an additional protocol was developed. This protocol, called WebHDFS, is based on an industry-standard RESTful mechanism that does not require Java binding. It works with operations such as reading files, writing to files, making directories, changing permissions, and renaming. It defines a public HTTP REST API, which permits clients to access HDFS over the Web. Clients can use common tools such as curl/wget to access the HDFS.
WebHDFS provides web services access to data stored in HDFS. At the same time, it retains the security the native Hadoop protocol offers and uses parallelism, for better throughput.
To enable WebHDFS (REST API) in the name node and data nodes, you must set the value of dfs.webhdfs.enabled configuration property to true in hdfs-site.xml configuration file as shown in the Figure 3.15.
FIGURE 3.15 WebHDFS-related configuration.