Spark类加载冲突解决
Spark类加载问题主要涉及依赖冲突,核心配置是通过 spark.driver.extraClassPath 或 --extra-jars (或 --jars) 来优先加载用户自定义的JAR包,覆盖默认或较低版本的系统/Spark自带包,以解决不同版本库冲突;配置时用:分隔多个包,确保用户JAR包有最高加载优先级,避免ClassNotFound。
问题引入
Spark 3.3.4版本下spark/jars目录下有netty-common-4.1.74.Final.jar,但是项目使用的是netty-common-4.1.101.Final.jar。4.1.101.Final版本存在 NetUtil.NETWORK_INTERFACES这个字段,而4.1.74.Final不存在。项目中统一了netty-common版本,所以项目打的fat.jar中依赖的是4.1.101这个版本。
<dependencyManagement>
<dependencies>
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-common</artifactId>
<version>4.1.101.Final</version>
</dependency>
</dependencies>
</dependencyManagement>
驱动类如下,简化的部分代码,这个是编译通过的:
public class TestApp {
public static void main(String[] args) {
System.out.println(" driverClassLoader: " + TestApp.class.getClassLoader());
System.out.println(" Thread Context ClassLoader: " + Thread.currentThread().getContextClassLoader());
System.out.println(NetUtil.NETWORK_INTERFACES);
}
}
很显然,在运行时会报错:
Exception in thread "main" java.lang.NoSuchFieldError: NETWORK_INTERFACES
at demo.TestApp.start(TestApp.java:10)
at spark.SparkApplication.main(SparkApplication.java:82)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
一般通过 spark 提供的几个配置就可解决这个问题:
| 配置 | 默认值 | 含义 | 版本 |
|---|---|---|---|
park.driver.userClassPathFirst |
false | (Experimental) Whether to give user-added jars precedence over Spark's own jars when loading classes in the driver. This feature can be used to mitigate conflicts between Spark's dependencies and user dependencies. It is currently an experimental feature. This is used in cluster mode only. | 1.3.0 |
spark.executor.userClassPathFirst |
false | (Experimental) Same functionality as spark.driver.userClassPathFirst, but applied to executor instances. |
1.3.0 |
spark.driver.userClassPathFirst、spark.executor.userClassPathFirst
https://spark.apache.org/docs/latest/configuration.html
-conf spark.driver.userClassPathFirst=true: 强制 Driver 优先加载用户 JAR (仅 Driver)。
--conf spark.executor.userClassPathFirst=true: 强制 Executor 优先加载用户 JAR。
原理
类加载问题,我们可以先记录下此时的驱动类加载器和线程上下文类加载器 Thread.currentThread().getContextClassLoader()
- --deploy-mode=client:驱动类加载器和线程上下文类加载器都是
org.apache.spark.util.MutableURLClassLoader - --deploy-mode=cluster:驱动类加载器是
sun.misc.Launcher$AppClassLoader,线程上下文类加载器是org.apache.spark.util.MutableURLClassLoader - 当
spark.driver.userClassPathFirst为true时,驱动类的类加载器和线程上下文类加载器都是org.apache.spark.util.ChildFirstURLClassLoader
package demo;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.VoidFunction;
import org.apache.spark.sql.SparkSession;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
public class TestApp {
public static void main(String[] args) {
System.out.println(" [Driver] driverClassLoader: " + TestApp.class.getClassLoader());
System.out.println(" [Driver] Thread Context ClassLoader: " + Thread.currentThread().getContextClassLoader());
final SparkSession spark = SparkSession.builder()
.appName("TestApp")
.master(args.length == 0 ? "local[*]" : args[0])
.config("spark.eventLog.enabled", "true") // 开启事件日志
.config("spark.eventLog.dir", "hdfs://localhost:19000/spark-logs") // 事件日志存放目录
.config("spark.history.fs.logDirectory", "hdfs://localhost:19000/spark-logs")
.config("spark.yarn.historyServer.address", "http://localhost:18080") // 指定 HistoryServer地址
.config("spark.sql.adaptive.enabled", "true")
.config("spark.sql.adaptive.coalescePartitions.enabled", "true")
.getOrCreate();
JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
List<Integer> list = IntStream.range(1, 100).boxed().collect(Collectors.toList());
JavaRDD<Integer> rdd = jsc.parallelize(list, 10);
rdd.foreach(new VoidFunction<Integer>() {
boolean flag = false;
@Override
public void call(Integer integer) throws Exception {
if (!flag) {
System.out.println(this.getClass());
System.out.println(" [Executor] classLoader: " + this.getClass().getClassLoader());
System.out.println(" [Executor] Thread Context ClassLoader: " + Thread.currentThread().getContextClassLoader());
flag = true;
}
}
});
jsc.close();
}
}
ChildFirstURLClassLoader 是打破了双亲委派机制的,优先加载用户指定的 classpath
public class ChildFirstURLClassLoader extends MutableURLClassLoader {
private ParentClassLoader parent;
public ChildFirstURLClassLoader(URL[] urls, ClassLoader parent) {
super(urls, (ClassLoader)null);
this.parent = new ParentClassLoader(parent);
}
public Class<?> loadClass(String name, boolean resolve) throws ClassNotFoundException {
try {
return super.loadClass(name, resolve);
} catch (ClassNotFoundException var4) {
return this.parent.loadClass(name, resolve);
}
}
public Enumeration<URL> getResources(String name) throws IOException {
ArrayList<URL> urls = Collections.list(super.getResources(name));
urls.addAll(Collections.list(this.parent.getResources(name)));
return Collections.enumeration(urls);
}
public URL getResource(String name) {
URL url = super.getResource(name);
return url != null ? url : this.parent.getResource(name);
}
static {
ClassLoader.registerAsParallelCapable();
}
}
spark.driver.extraClassPath、spark.executor.extraClassPath
目的是让 Spark 优先加载用户提交的 jar,而不是 JVM/Spark 默认的 jar。在 Driver 启动时设置,将用户 JAR 包放在最前面。此时驱动类类加载器为 org.apache.spark.util.ChildFirstURLClassLoader
spark.jars.packages
spark.jars.packages:使用 Maven 坐标加载包。
spark.jars.excludes:排除spark.jars.packages指定的包中的某些包。值的形式是groupId:artifactId,groupId:artifactId
spark.jars.packages |
Comma-separated list of Maven coordinates of jars to include on the driver and executor classpaths. The coordinates should be groupId:artifactId:version. If spark.jars.ivySettings is given artifacts will be resolved according to the configuration in the file, otherwise artifacts will be searched for in the local maven repo, then maven central and finally any additional remote repositories given by the command-line option --repositories. For more details, see Advanced Dependency Management. |
1.5.0 | |
|---|---|---|---|
spark.jars.excludes |
Comma-separated list of groupId:artifactId, to exclude while resolving the dependencies provided in spark.jars.packages to avoid dependency conflicts. |
1.5.0 |

浙公网安备 33010602011771号