sparkSQL 数据写入es

最近有需求,要将spark的数据写入es.在网上查找了一番,再测试过后,顺利将任务完成,记录下.
直接上代码:
pom文件:

<dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.1.3</version>
            <!--<scope>provided</scope>-->
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.1.3</version>
            <!--<scope>provided</scope>-->
        </dependency>

        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>2.11.8</version>
            <!--<scope>provided</scope>-->
        </dependency>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.36</version>
        </dependency>

        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch-spark-20_2.11</artifactId>
            <version>6.2.4</version>
        </dependency>
        <!-- 读取配置文件 -->
        <dependency>
            <groupId>commons-configuration</groupId>
            <artifactId>commons-configuration</artifactId>
            <version>1.5</version>
        </dependency>
        <dependency>
            <groupId>commons-codec</groupId>
            <artifactId>commons-codec</artifactId>
            <version>1.10</version>
        </dependency>
    </dependencies>

代码:

package cn.demo

import java.util.Properties

import org.apache.spark.SparkConf
import org.apache.spark.sql.{DataFrame, SparkSession}
import org.elasticsearch.spark.sql.EsSparkSQL

/**
  * author:Administrator
  * name:ESDemo
  */
object ESDemo {
  def main(args: Array[String]): Unit = {
    val sparkConf = new SparkConf().setAppName(ESDemo.getClass.getName).setMaster("local")
    sparkConf.set("es.nodes","192.168.0.61")
    sparkConf.set("es.port","9200")
    sparkConf.set("es.index.auto.create", "true")
    sparkConf.set("es.write.operation", "index")
    val sparkSession: SparkSession = SparkSession.builder().config(sparkConf).getOrCreate()

    val url: String = "jdbc:mysql://localhost:3306/testdb"
    val table: String = "courses"
    val properties: Properties = new Properties()
    properties.put("user","root")
    properties.put("password","123456")
    properties.put("driver","com.mysql.jdbc.Driver")
    val course: DataFrame = sparkSession.read.jdbc(url,table,properties)
    course.show()
    EsSparkSQL.saveToEs(course,"course")

    sparkSession.stop()
  }
}

demo里是在数据写入es的时候自动创建的索引,mapping映射是自动创建的,在实际生产过程中一般需要先创建好es的索引,建立好mapping映射.
es对hadoop,hive, spark等这些大数据项目都有支持,具体可以看看es官方文档.
参考:es官方文档

posted @ 2019-02-25 00:06  胖子坠  阅读(72)  评论(0)    收藏  举报