[atguigu@hadoop102 mycode]$ cd /opt/module/hadoop-3.1.3/
[atguigu@hadoop102 hadoop-3.1.3]$ cd bin/
[atguigu@hadoop102 bin]$ ll
总用量 996
-rwxr-xr-x. 1 atguigu atguigu 441936 9月  12 2019 container-executor
-rwxr-xr-x. 1 atguigu atguigu   8707 9月  12 2019 hadoop
-rwxr-xr-x. 1 atguigu atguigu  11265 9月  12 2019 hadoop.cmd
-rwxr-xr-x. 1 atguigu atguigu  11026 9月  12 2019 hdfs
-rwxr-xr-x. 1 atguigu atguigu   8081 9月  12 2019 hdfs.cmd
-rwxr-xr-x. 1 atguigu atguigu   6237 9月  12 2019 mapred
-rwxr-xr-x. 1 atguigu atguigu   6311 9月  12 2019 mapred.cmd
-rwxr-xr-x. 1 atguigu atguigu 483728 9月  12 2019 test-container-executor
-rwxr-xr-x. 1 atguigu atguigu  11888 9月  12 2019 yarn
-rwxr-xr-x. 1 atguigu atguigu  12840 9月  12 2019 yarn.cmd
[atguigu@hadoop102 bin]$ hdfs dfs -mkdir -p /user/hadoop/spark/mycode/rdd/examples

上传

[atguigu@hadoop102 bin]$ hdfs dfs -put /usr/local/spark/mycode/rdd/file1.txt /user/hadoop/spark/mycode/rdd/examples
2024-01-22 18:16:10,917 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
[atguigu@hadoop102 bin]$ hdfs dfs -put /usr/local/spark/mycode/rdd/file2.txt /user/hadoop/spark/mycode/rdd/examples
2024-01-22 18:16:17,260 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

查看：

[atguigu@hadoop102 bin]$ hdfs dfs -ls /user/hadoop/spark/mycode/rdd/examples
Found 2 items
-rw-r--r--   3 atguigu supergroup        125 2024-01-22 18:16 /user/hadoop/spark/mycode/rdd/examples/file1.txt
-rw-r--r--   3 atguigu supergroup        102 2024-01-22 18:16 /user/hadoop/spark/mycode/rdd/examples/file2.txt

可视化查看：

name := "Simple Project"
version := "1.0"                    
scalaVersion := "2.12.15"
libraryDependencies += "org.apache.spark" %% "spark-core" % "3.2.0"

目录结构：

[atguigu@hadoop102 topscala]$ find .
.
./src
./src/main
./src/main/scala
./src/main/scala/SimpleApp.scala
./simple.sbt

进行打包

[atguigu@hadoop102 topscala]$ /usr/local/sbt/sbt package

运行：

[atguigu@hadoop102 spark]$ ./bin/spark-submit --class "TopN" /usr/local/spark/mycode/topscala/target/scala-2.12/simple-project_2.12-1.0.jar

结果：

代码分析：

import org.apache.spark.{SparkConf, SparkContext}
object MaxAndMin {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("MaxAndMin").setMaster("local")
    val sc = new SparkContext(conf)
    sc.setLogLevel("ERROR")
    val lines = sc.textFile("hdfs://localhost:9000/user/hadoop/spark/chapter5", 2)
    val result = lines.filter(_.trim().length>0).map(line => ("key",line.trim.toInt)).groupByKey().map(x => {
    var min = Integer.MAX_VALUE
    var max = Integer.MIN_VALUE
    for(num <- x._2){
      if(num>max){
        max = num
      }
      if(num<min){
        min = num
      }
    }
    (max,min)
  }).collect.foreach(x => {
    println("max\t"+x._1)
    println("min\t"+x._2)
  })
}
}

进行打包执行

[atguigu@hadoop102 scala]$ cd ..
[atguigu@hadoop102 main]$ cd ..
[atguigu@hadoop102 src]$ cd ..
[atguigu@hadoop102 MaxAndMin]$ vim simple.sbt
[atguigu@hadoop102 MaxAndMin]$ find .
.
./src
./src/main
./src/main/scala
./src/main/scala/SimpleApp.scala
./simple.sbt
[atguigu@hadoop102 MaxAndMin]$ /usr/local/sbt/sbt package
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256M; support was removed in 8.0
[info] Loading project definition from /usr/local/spark/mycode/MaxAndMin/MaxAndMin/project
[info] Loading settings for project maxandmin from simple.sbt ...
[info] Set current project to Simple Project (in build file:/usr/local/spark/mycode/MaxAndMin/MaxAndMin/)
[info] Compiling 1 Scala source to /usr/local/spark/mycode/MaxAndMin/MaxAndMin/target/scala-2.12/classes ...
[success] Total time: 4 s, completed 2024-1-22 21:35:51
[atguigu@hadoop102 MaxAndMin]$ ll
总用量 4
drwxrwxr-x. 3 atguigu atguigu  44 1月  22 21:33 project
-rw-rw-r--. 1 atguigu atguigu 141 1月  22 21:33 simple.sbt
drwxrwxr-x. 3 atguigu atguigu  18 1月  22 21:27 src
drwxrwxr-x. 4 atguigu atguigu  39 1月  22 21:34 target
[atguigu@hadoop102 MaxAndMin]$ cd target/
[atguigu@hadoop102 target]$ ll
总用量 0
drwxrwxr-x. 4 atguigu atguigu 70 1月  22 21:35 scala-2.12
drwxrwxr-x. 4 atguigu atguigu 36 1月  22 21:34 streams
[atguigu@hadoop102 target]$ cd scala-2.12/
[atguigu@hadoop102 scala-2.12]$ ll
总用量 4
drwxrwxr-x. 2 atguigu atguigu   53 1月  22 21:35 classes
-rw-rw-r--. 1 atguigu atguigu 3776 1月  22 21:35 simple-project_2.12-1.0.jar
drwxrwxr-x. 3 atguigu atguigu   31 1月  22 21:34 update
[atguigu@hadoop102 scala-2.12]$ pwd
/usr/local/spark/mycode/MaxAndMin/MaxAndMin/target/scala-2.12
[atguigu@hadoop102 scala-2.12]$ cd ..
[atguigu@hadoop102 target]$ cd ..
[atguigu@hadoop102 MaxAndMin]$ cd ..
[atguigu@hadoop102 MaxAndMin]$ cd ..
[atguigu@hadoop102 mycode]$ cd ..
[atguigu@hadoop102 spark]$ ./bin/spark-submit --class "MaxAndMin" /usr/local/spark/mycode/MaxAndMin/MaxAndMin/target/scala-2.12/simple-project_2.12-1.0.jar

执行结果：

代码分析：

file3.txt

1
45
25

程序：

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.HashPartitioner

object FileSort {
  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("FileSort")
    val sc = new SparkContext(conf)
    val dataFile = "file:///usr/local/spark/mycode/rdd/data/*"
    val lines = sc.textFile(dataFile, 3)
    var index = 0
    val result = lines.filter(_.trim().length > 0).map(n => (n.trim.toInt, "")).partitionBy(new HashPartitioner(1)).sortByKey().map(t => {
      index += 1
      (index, t._1)
    })
    result.saveAsTextFile("file:///usr/local/spark/mycode/rdd/data/result")
  }
}

执行过程

[atguigu@hadoop102 spark]$ cd mycode/rdd
[atguigu@hadoop102 rdd]$ ll
总用量 8
-rw-rw-r--. 1 atguigu atguigu 124 1月  22 18:37 file1.txt
-rw-rw-r--. 1 atguigu atguigu 102 1月  22 18:08 file2.txt
[atguigu@hadoop102 rdd]$ mkdir data
[atguigu@hadoop102 rdd]$ vim file1.txt 
[atguigu@hadoop102 rdd]$ cd data/
[atguigu@hadoop102 data]$ vim file1.txt 
[atguigu@hadoop102 data]$ vim file2.txt 
[atguigu@hadoop102 data]$ vim file3.txt 
[atguigu@hadoop102 data]$ cd ..cd
-bash: cd: ..cd: 没有那个文件或目录
[atguigu@hadoop102 data]$ cd ..
[atguigu@hadoop102 rdd]$ cd ..
[atguigu@hadoop102 mycode]$ mkdir -p  filesort/src/main/scala
[atguigu@hadoop102 mycode]$ cd /s
sbin/     software/ srv/      sys/      
[atguigu@hadoop102 mycode]$ cd filesort/src/main/scala/
[atguigu@hadoop102 scala]$ vim SimpleApp.scala
[atguigu@hadoop102 scala]$ cd ..
[atguigu@hadoop102 main]$ cd ..
[atguigu@hadoop102 src]$ cd ..
[atguigu@hadoop102 filesort]$ vim simple.sbt
[atguigu@hadoop102 filesort]$ find .
.
./src
./src/main
./src/main/scala
./src/main/scala/SimpleApp.scala
./simple.sbt
[atguigu@hadoop102 filesort]$ /usr/local/sbt/sbt package
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256M; support was removed in 8.0
[info] Loading project definition from /usr/local/spark/mycode/filesort/project
[info] Loading settings for project filesort from simple.sbt ...
[info] Set current project to Simple Project (in build file:/usr/local/spark/mycode/filesort/)
[info] Compiling 1 Scala source to /usr/local/spark/mycode/filesort/target/scala-2.12/classes ...
[success] Total time: 4 s, completed 2024-1-22 22:14:06
[atguigu@hadoop102 filesort]$

运行过程报错太多，后面流程一样

主要原因：读取本地文件夹要加/*

val dataFile = "file:///usr/local/spark/mycode/rdd/data/*"

posted on 2024-01-24 22:14 实名吓我一跳阅读(43) 评论(0) 收藏举报

刷新页面返回顶部

导航

案例1：求TOP值