大数据学习——sparkSql

官网http://spark.apache.org/docs/1.6.2/sql-programming-guide.html

val sc: SparkContext // An existing SparkContext.
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val df = sqlContext.read.json("hdfs://mini1:9000/person.json")
1.在本地创建一个文件,有三列,分别是id、name、age,用空格分隔,然后上传到hdfs上
hdfs dfs -put person.json /

2.在spark shell执行下面命令,读取数据,将每一行的数据使用列分隔符分割
val lineRDD = sc.textFile("hdfs://mini1:9000/person.json").map(_.split(" ")) 

3.定义case class(相当于表的schema) case class Person(id:Int, name:String, age:Int)

4.将RDD和case class关联 val personRDD = lineRDD.map(x => Person(x(0).toInt, x(1), x(2).toInt))

5.将RDD转换成DataFrame val personDF = personRDD.toDF

6.对DataFrame进行处理 personDF.show

 

 DSL风格语法

 

 

 

 SQL风格语法 

scala> val dataRDD=sc.textFile("hdfs://mini1:9000/person.json")
dataRDD: org.apache.spark.rdd.RDD[String] = hdfs://mini1:9000/person.json MapPartitionsRDD[120] at textFile at <console>:27

scala> case class Person(id:Int ,name: String, age: Int)
defined class Person

scala> val personDF=dataRDD.map(_.split(" ")).map(x=> Person(x(0).toInt,x(1),x(2).toInt)).toDF()
scala>  personDF.registerTempTable("t_person")

 

 

 

SparkSqlTest
package org.apache.spark

import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{DataFrame, SQLContext}

/**
  * Created by Administrator on 2019/6/12.
  */
object SparkSqlTest {
  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("sparksql").setMaster("local[1]")
    val sc = new SparkContext(conf)
    val sqlContext = new SQLContext(sc)
    val file: RDD[String] = sc.textFile("hdfs://mini1:9000/person.json")
    val personRDD = file.map(_.split(" ")).map(x => Person(x(0).toInt, x(1), x(2).toInt))
    import sqlContext.implicits._
    val personDF: DataFrame = personRDD.toDF()
    personDF.registerTempTable("t_person")
    sqlContext.sql("select * from t_person").show

  }
}

case class Person(id: Int, name: String, age: Int)
+---+--------+---+
| id| name|age|
+---+--------+---+
| 1|zhangsan| 23|
| 2| wangwu| 34|
| 3| lisi| 43|
+---+--------+---+

 

posted on 2019-06-12 19:45  o_0的园子  阅读(252)  评论(0编辑  收藏  举报