flink-connector-pulsar 排错记录

遇到的问题

<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-connector-pulsar_2.11</artifactId>
    <version>1.14.4</version>
</dependency>
  • 在IDE里可以直接运行,但需要勾选 Add denpendencies with "provided" scope to classpath
  • 打包成jar 提交后,提交成功但无法启动,异常如下:
2022-06-15 10:32:13
org.apache.flink.util.FlinkException: Global failure triggered by OperatorCoordinator for 'Source: pulsar-source-test -> (Sink: Print to Std. Out, Map -> Filter -> Map -> Filter -> Map)' (operator 06f6b9841ee245744a878eedfd102524).
	at org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder$LazyInitializedCoordinatorContext.failJob(OperatorCoordinatorHolder.java:545)
	at org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator$QuiesceableContext.failJob(RecreateOnResetOperatorCoordinator.java:223)
	at org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext.failJob(SourceCoordinatorContext.java:285)
	at org.apache.flink.runtime.source.coordinator.SourceCoordinator.start(SourceCoordinator.java:132)
	at org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator$DeferrableCoordinator.resetAndStart(RecreateOnResetOperatorCoordinator.java:381)
	at org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator.lambda$resetToCheckpoint$6(RecreateOnResetOperatorCoordinator.java:136)
	at java.base/java.util.concurrent.CompletableFuture$UniRun.tryFire(Unknown Source)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
	at java.base/java.util.concurrent.CompletableFuture.complete(Unknown Source)
	at org.apache.flink.runtime.operators.coordination.ComponentClosingUtils.lambda$closeAsyncWithTimeout$0(ComponentClosingUtils.java:71)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.pulsar.client.admin.internal.PulsarAdminImpl
	at org.apache.pulsar.client.admin.internal.PulsarAdminBuilderImpl.build(PulsarAdminBuilderImpl.java:47)
	at org.apache.flink.connector.pulsar.common.utils.PulsarExceptionUtils.sneaky(PulsarExceptionUtils.java:69)
	at org.apache.flink.connector.pulsar.common.utils.PulsarExceptionUtils.sneakyClient(PulsarExceptionUtils.java:46)
	at org.apache.flink.connector.pulsar.common.config.PulsarConfigUtils.createAdmin(PulsarConfigUtils.java:213)
	at org.apache.flink.connector.pulsar.source.enumerator.PulsarSourceEnumerator.<init>(PulsarSourceEnumerator.java:86)
	at org.apache.flink.connector.pulsar.source.PulsarSource.createEnumerator(PulsarSource.java:149)
	at org.apache.flink.runtime.source.coordinator.SourceCoordinator.start(SourceCoordinator.java:128)
	... 7 more

尝试解决

  • 使用IDEA点开jar包,可以找到org.apache.pulsar.client.admin.internal.PulsarAdminImpl等文件
  • 怀疑是版本兼容问题,更换1.14.0 到1.14.4 一样的错;更换到1.15.0 甚至报错 graph is cyclic, 提示更换大版本的迁移风险。
  • 尝试使用 jdeps 分析这个 jar 包的依赖 jdeps target/vdf-1.0.0.jar, 报异常:
Exception in thread "main" java.lang.module.FindException: Module java.xml.bind not found, required by java.ws.rs
        at java.base/java.lang.module.Resolver.findFail(Resolver.java:877)
        at java.base/java.lang.module.Resolver.resolve(Resolver.java:191)
        at java.base/java.lang.module.Resolver.resolve(Resolver.java:140)
  • 移除 pom 中的 flink-connector-pulsar ,再打包jar,上面异常消失,可以正常分析依赖。
  • 怀疑是依赖的依赖没有添加,找到connector原始仓库 flink-connector-pulsar/pom.xml, 添加以下依赖到 flink job 的 pom:
<dependency>
            <groupId>org.apache.pulsar</groupId>
            <artifactId>pulsar-client-all</artifactId>
            <version>${pulsar.version}</version>
            <exclusions>
                <exclusion>
                    <groupId>com.sun.activation</groupId>
                    <artifactId>javax.activation</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>jakarta.activation</groupId>
                    <artifactId>jakarta.activation-api</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>jakarta.ws.rs</groupId>
                    <artifactId>jakarta.ws.rs-api</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>jakarta.xml.bind</groupId>
                    <artifactId>jakarta.xml.bind-api</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>javax.validation</groupId>
                    <artifactId>validation-api</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>javax.xml.bind</groupId>
                    <artifactId>jaxb-api</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>net.jcip</groupId>
                    <artifactId>jcip-annotations</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.pulsar</groupId>
                    <artifactId>pulsar-package-core</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>com.beust</groupId>
                    <artifactId>jcommander</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
  • 再次打包、提交,之前的异常消失,且 jdeps可以进行依赖分析,但出现新的异常:
2022-06-15 16:32:33
org.apache.flink.util.FlinkException: Global failure triggered by OperatorCoordinator for 'Source: pulsar-source-test -> (Sink: Print to Std. Out, Map -> Filter -> Map -> Filter -> Map)' (operator 06f6b9841ee245744a878eedfd102524).
	at org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder$LazyInitializedCoordinatorContext.failJob(OperatorCoordinatorHolder.java:545)
	at org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator$QuiesceableContext.failJob(RecreateOnResetOperatorCoordinator.java:231)
	at org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext.failJob(SourceCoordinatorContext.java:287)
	at org.apache.flink.runtime.source.coordinator.SourceCoordinator.start(SourceCoordinator.java:128)
	at org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator$DeferrableCoordinator.resetAndStart(RecreateOnResetOperatorCoordinator.java:389)
	at org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator.lambda$resetToCheckpoint$6(RecreateOnResetOperatorCoordinator.java:144)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown Source)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
	at java.base/java.util.concurrent.CompletableFuture.complete(Unknown Source)
	at org.apache.flink.runtime.operators.coordination.ComponentClosingUtils.lambda$closeAsyncWithTimeout$0(ComponentClosingUtils.java:77)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.IncompatibleClassChangeError: Method 'org.apache.pulsar.client.admin.PulsarAdminBuilder org.apache.pulsar.client.admin.PulsarAdmin.builder()' must be Methodref constant
	at org.apache.flink.connector.pulsar.common.config.PulsarConfigUtils.createAdmin(PulsarConfigUtils.java:176)
	at org.apache.flink.connector.pulsar.source.enumerator.PulsarSourceEnumerator.<init>(PulsarSourceEnumerator.java:86)
	at org.apache.flink.connector.pulsar.source.PulsarSource.createEnumerator(PulsarSource.java:149)
	at org.apache.flink.runtime.source.coordinator.SourceCoordinator.start(SourceCoordinator.java:124)
	... 8 more

  • 后来检查发现上面的异常是因为版本问题, pulsar.version 应该为2.8.0 而不是2.7.0:
            <groupId>org.apache.pulsar</groupId>
            <artifactId>pulsar-client-all</artifactId>
            <version>${pulsar.version}</version>

可以检查maven仓库,从1.14.01.14.4,都是pulsar 2.8.0

最终解决方法

  • maven-shade-plugin 里面的 <exclude>org.slf4j:*</exclude> 这行去掉即可,无需添加 flink-connector-pulsar 之外的其他依赖
  • jdeps 的异常是因为,pulsar connect 的依赖中有 pulsar-client,其中又用到了 java.ws.rs,而它又用到了java.xml.bind ,但是 java 11 中把java.xml.bind 移除了,参考,通过加入:<exclusion> <groupId>jakarta.ws.rs</groupId> <artifactId>jakarta.ws.rs-api</artifactId> </exclusion>可以解决

总结

  • 通过逐个检查 pulsar connector 的依赖,发现其需要 org.slf4j :
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.15</version>
<scope>provided</scope>
</dependency>

再通过检查自己项目下的pom,发现exclude,去之即可。

posted @ 2022-06-16 13:26  略略略——  阅读(1769)  评论(0编辑  收藏  举报