[spark@master ~]$ spark-shell --master yarn-client --jars /app/soft/hive/lib/mysql-connector-java-5.1.44-bin.jar
scala> import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SQLContext
scala> val sqlContext = new SQLContext(sc)
warning: there was one deprecation warning; re-run with -deprecation for details
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@432a6a69
scala> val res = sqlContext.sql("select * from lb")
res: org.apache.spark.sql.DataFrame = [cookieid: string, createtime: string ... 1 more field]
scala> res.show()
+--------+----------+---+
|cookieid|createtime| pv|
+--------+----------+---+
| cookie1|2015-11-11| 1|
| cookie1|2015-11-12| 4|
| cookie1|2015-11-13| 5|
| cookie1|2015-11-14| 4|
| cookie2|2015-11-11| 7|
| cookie2|2015-11-12| 3|
| cookie2|2015-11-13| 8|
| cookie2|2015-11-14| 2|
+--------+----------+---+
建表
scala> val path = "hdfs://master:9000/data/Romeo_and_Juliet.txt"
path: String = hdfs://master:9000/data/Romeo_and_Juliet.txt
scala> val df2 = spark.sparkContext.textFile(path).flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).toDF("word","count")
df2: org.apache.spark.sql.DataFrame = [word: string, count: int]
scala> df2.write.mode("overwrite").saveAsTable("badou.test_a")
18/01/28 08:15:10 WARN metastore.HiveMetaStore: Location: hdfs://master:9000/user/hive/warehouse/badou.db/test_a specified for non-external table:test_a
--------------------
hive> use badou;
hive> show tables;
hive> select * from test_a order by count desc limit 10;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1516801273097_0045, Tracking URL = http://master:8088/proxy/application_1516801273097_0045/
Kill Command = /app/soft/hadoop/bin/hadoop job -kill job_1516801273097_0045
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-01-28 09:08:22,144 Stage-1 map = 0%, reduce = 0%
2018-01-28 09:08:29,615 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.37 sec
2018-01-28 09:08:37,987 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.18 sec
MapReduce Total cumulative CPU time: 3 seconds 180 msec
Ended Job = job_1516801273097_0045
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 3.18 sec HDFS Read: 54970 HDFS Write: 69 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 180 msec
OK
4132
the 614
I 531
and 462
to 449
a 392
of 364
my 313
is 290
in 282
Time taken: 28.159 seconds, Fetched: 10 row(s)
浙公网安备 33010602011771号