Spark SQL 数据源 json文件

1.启动命令

[root@cdh1 ~]# spark-shell
22/05/24 20:24:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://cdh1:4040
Spark context available as 'sc' (master = local[*], app id = local-1653395123656).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.2
      /_/
         
Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_311)
Type in expressions to have them evaluated.
Type :help for more information.

scala>

#Spark SQL 使用反射推断模式

scala> var sqlcontext = new org.apache.spark.sql.SQLContext(sc)
warning: there was one deprecation warning (since 2.0.0); for details, enable `:setting -deprecation' or `:replay -deprecation'
sqlcontext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@15c8c97
scala> import sqlContext._
import sqlContext._
scala> case class Demo(id: Int, name: String, age: Int)
defined class Demo
scala> val empl=sc.textFile("Demo.txt").map(_.split(",")).map(e⇒Demo(e(0).trim.toInt,e(1),e(2).trim.toInt)).toDF()
empl: org.apache.spark.sql.DataFrame = [id: int, name: string ... 1 more field]
scala> empl.registerTempTable("Demo")
warning: there was one deprecation warning (since 2.0.0); for details, enable `:setting -deprecation' or `:replay -deprecation'

scala> val allcolumn = sqlContext.sql("select * from Demo")
allcolumn: org.apache.spark.sql.DataFrame = [id: int, name: string ... 1 more field]

scala> allcolumn.show()
+----+-------+------+                                                           
|  id|   name|   age|
+----+-------+------+
|1201| satish|251202|
+----+-------+------+
scala> val onecolumn = sqlContext.sql("select * from Demo where id > 1200")
onecolumn: org.apache.spark.sql.DataFrame = [id: int, name: string ... 1 more field]

scala> onecolumn.show()
+----+-------+------+
|  id|   name|   age|
+----+-------+------+
|1201| satish|251202|
+----+-------+------+
scala> onecolumn.map(t=>"ID: "+t(0)).collect().foreach(println)
ID: 1201

scala> onecolumn.map(t=>"ID:"+t(0)+" NAME:"+t(1)).collect().foreach(println)
ID:1201 NAME: satish

-- 使用反射来推断模式(此文章有部分错误点;)

posted @ 2022-05-25 22:56  不吃酸豆角  阅读(53)  评论(0)    收藏  举报