编译spark-atlas-connector
一、代码下载地址
https://github.com/hortonworks-spark/spark-atlas-connector.git
下载完成后,上传至/opt/soft目录
二、编译准备
1、由于代码中的pom文件已经修改好,故无需再进行版本改动,直接编译即可
2、改动说明:本代码是直接从GitHub开源平台中拉取spark-atlas-connector项目的,改动点如下:
(1)pom文件中修改对应组件的版本号
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
<java.version>1.8</java.version>
<spark.version>2.4.0-cdh6.1.1</spark.version>
<atlas.version>2.1.0</atlas.version>
<maven.version>3.6.3</maven.version>
<scala.version>2.11.12</scala.version>
<scala.binary.version>2.11</scala.binary.version>
<kafka.version>2.2.1</kafka.version>
<MaxPermGen>512m</MaxPermGen>
<CodeCacheSize>512m</CodeCacheSize>
<minJavaVersion>1.8</minJavaVersion>
<maxJavaVersion>1.8</maxJavaVersion>
<test.redirectToFile>true</test.redirectToFile>
<scalatest.version>3.0.3</scalatest.version>
<mockito.version>1.10.19</mockito.version>
<integration.test.enabled>false</integration.test.enabled>
<jersey.version>1.19</jersey.version>
<scoverage.plugin.version>1.3.0</scoverage.plugin.version>
<!-- jackson version pulled from atlas-intg -->
<jackson.version>2.9.6</jackson.version>
</properties>
(2)pom文件中添加阿里云镜像
<repositories>
<repository>
<id>aliyun</id>
<name>aliyun</name>
<url>https://maven.aliyun.com/repository/public</url>
</repository>
</repositories>
(3)pom文件中修改Maven版本号
<maven.version>3.6.3</maven.version>
(4)注释如下依赖:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive-thriftserver_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<version>2.4.0</version>
<scope>provided</scope>
</dependency>
(5)修改spark-atlas-connector/src/main/scala/com/hortonworks/spark/atlas/utils/SparkUtils.scala
删除部分:
import scala.util.control.NonFatal import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 修改部分: def currSessionUser(qe: QueryExecution): String = { currUser() /* // ok , i accept your suggestion val thriftServerListener = Option(HiveThriftServer2.listener) thriftServerListener match { case None => currUser() } */ }
三、执行编译命令
mvn clean -DskipTests package -Pdist
四、结果查看
cd /opt/soft/spark-atlas-connector/spark-atlas-connector-assembly/target
五、使用
1、将spark-atlas-connector-assembly-0.1.0-SNAPSHOT.jar上传至/opt/module/atlas/conf目录
2、分发Atlas的conf目录至CDH集群
将/opt/module/atlas/conf目录分发给CDH集群中的节点,方便后续启动Spark作业时直接指定到conf目录
3、启动命令实例
spark-submit --class com.yuange.spark.atlastest.StudentsAndTeachersTwo --master yarn --driver-java-options "-Datlas.conf=/opt/module/atlas/conf" --jars /opt/module/atlas/conf/spark-atlas-connector-assembly-0.1.0-SNAPSHOT.jar --conf spark.extraListeners=com.hortonworks.spark.atlas.SparkAtlasEventTracker --conf spark.sql.queryExecutionListeners=com.hortonworks.spark.atlas.SparkAtlasEventTracker --conf spark.sql.streaming.streamingQueryListeners=com.hortonworks.spark.atlas.SparkAtlasStreamingQueryEventTracker /opt/program/spark/original-yuange-spark-1.0-SNAPSHOT.jar
说明:Spark作业启动时,加上以下四个参数即可
--driver-java-options "-Datlas.conf=/opt/module/atlas/conf" --jars /opt/module/atlas/conf/spark-atlas-connector-assembly-0.1.0-SNAPSHOT.jar --conf spark.extraListeners=com.hortonworks.spark.atlas.SparkAtlasEventTracker --conf spark.sql.queryExecutionListeners=com.hortonworks.spark.atlas.SparkAtlasEventTracker --conf spark.sql.streaming.streamingQueryListeners=com.hortonworks.spark.atlas.SparkAtlasStreamingQueryEventTracker
正常来说,Spark作业的启动命令为:
spark-submit --class com.yuange.spark.atlastest.StudentsAndTeachersTwo --master yarn /opt/program/spark/original-yuange-spark-1.0-SNAPSHOT.jar

浙公网安备 33010602011771号