手把手教你 在IDEA搭建 SparkSQL的开发环境

1. 创建maven项目 在IDEA中添加scala插件 并添加scala的sdk

https://www.cnblogs.com/bajiaotai/p/15381309.html

2. 相关依赖jar的引入 配置pom.xml

2.1 pom.xml 示例 (spark版本: 3.0.0  scala版本: 2.12)

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.dxm.sparksql</groupId>
    <artifactId>sparksql</artifactId>
    <version>1.0-SNAPSHOT</version>

    <!-- 指定变量 spark的版本信息 scala的版本信息--> 
    <properties>
        <spark.version>3.0.0</spark.version>
        <scala.version>2.12</scala.version>
    </properties>

    <dependencies>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-yarn_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>

        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.27</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-exec</artifactId>
            <version>1.2.1</version>
        </dependency>

    </dependencies>


</project>

2.2 spark版本与scala版本对应关系的问题

#根据下面链接 即可查询 spark版本和scala版本的对应关系及依赖配置
https://www.cnblogs.com/bajiaotai/p/16270971.html

2.3 在scala代码中查看运行时的scala版本

println(util.Properties.versionString)

2.4 FAQ 因Spark版本和Scala版本不一致导致的报错

待补充

3. 代码测试

object TestSparkSQLEnv extends App {

  //1.初始化 SparkSession 对象
  val spark = SparkSession
    .builder
    .master("local")
    //.appName("SparkSql Entrance Class SparkSession")
    //.config("spark.some.config.option", "some-value")
    .getOrCreate()

  //2.通过 SparkSession 获取 SparkContext
  private val sc: SparkContext = spark.sparkContext

  //3.设置日志级别
  // Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN
  // This overrides any user-defined log settings //会覆盖掉 用户设置的日志级别 比如 log4j.properties
  sc.setLogLevel("ERROR")

  import spark.implicits._

  //4.创建DataFream
  private val rdd2DfByCaseClass: DataFrame = spark.sparkContext
    .makeRDD(Array(Person("疫情", "何时"), Person("结束", "呢")))
    .toDF("名称", "行动")
  rdd2DfByCaseClass.show()
  //  +----+----+
  //  |名称|行动|
  //  +----+----+
  //  |疫情|何时|
  //  |结束|  呢|
  //  +----+----+

  //5.关闭资源
  spark.stop()

}

4. 设置日志级别

4.1 运行时日志级别(优先级最高)

//运行时指定 日志级别 (只在提交的Application有效)
spark.sparkContext.setLogLevel("INFO")

4.2 添加 resources/log4j.properties 配置文件

当不指定时,默认使用 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Set everything to be logged to the console
log4j.rootCategory=info, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Set the default spark-shell log level to WARN. When running the spark-shell, the
# log level for this class is used to overwrite the root logger's log level, so that
# the user can have different defaults for the shell and regular Spark apps.
log4j.logger.org.apache.spark.repl.Main=WARN

# Settings to quiet third party logs that are too verbose
log4j.logger.org.sparkproject.jetty=WARN
log4j.logger.org.sparkproject.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO

# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs
# in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR

# Parquet related logging
log4j.logger.org.apache.parquet.CorruptStatistics=ERROR
log4j.logger.parquet.CorruptStatistics=ERROR

5. 结束语

如果能正常执行,恭喜你环境搭建没问题,如果遇到问题请留言共同探讨,如果对您有所帮助,麻烦点赞加评论

 

posted @ 2022-05-14 18:13  学而不思则罔!  阅读(1253)  评论(0编辑  收藏  举报