Using Spark's "Hadoop Free" Build

Spark uses Hadoop client libraries for HDFS and YARN. Starting in version Spark 1.4, the project packages “Hadoop free” builds that lets you more easily connect a single Spark binary to any Hadoop version. To use these builds, you need to modify SPARK_DIST_CLASSPATH to include Hadoop’s package jars. The most convenient place to do this is by adding an entry in conf/spark-env.sh.

This page describes how to connect Spark to Hadoop for different types of distributions.

Spark使用HDFS和YARN的Hadoop客户端库。 从Spark 1.4版本开始,项目包“Hadoop free”构建,可让您更轻松地将单个Spark二进制文件连接到任何Hadoop版本。 要使用这些构建,您需要修改SPARK_DIST_CLASSPATH以包含Hadoop的包jar。 最方便的地方是在conf / spark-env.sh中添加一个条目。

本页介绍如何将Spark连接到Hadoop以用于不同类型的分发。

Apache Hadoop

For Apache distributions, you can use Hadoop’s ‘classpath’ command. For instance:

 1 ### in conf/spark-env.sh ###
 2 
 3 # If 'hadoop' binary is on your PATH
 4 export SPARK_DIST_CLASSPATH=$(hadoop classpath)
 5 
 6 # With explicit path to 'hadoop' binary
 7 export SPARK_DIST_CLASSPATH=$(/path/to/hadoop/bin/hadoop classpath)
 8 
 9 # Passing a Hadoop configuration directory
10 export SPARK_DIST_CLASSPATH=$(hadoop --config /path/to/configs classpath)

 

posted @ 2017-11-15 09:06  gyhuminyan  阅读(263)  评论(0编辑  收藏  举报