dolphinscheduler无法启动工作流

dolphinscheduler版本:3.2.0
情景复现:
  • 当操作dolphinscheduler web ui任意删除按钮功能时。
  • dolphinscheduler web ui页面卡顿的情况下,连续点击工作流运行按钮(疑似并发)。

master节点报错日志如下:

[ERROR] 2024-03-13 12:16:07.468 +0800 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerBootstrap:[154] - [WorkflowInstance-0][TaskInstance-0] - Master handle command 1 error 
org.apache.dolphinscheduler.server.master.exception.WorkflowCreateException: Create workflow execute runnable failed
    at org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnableFactory.createWorkflowExecuteRunnable(WorkflowExecuteRunnableFactory.java:93)
    at org.apache.dolphinscheduler.server.master.runner.MasterSchedulerBootstrap.lambda$run$0(MasterSchedulerBootstrap.java:137)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
    at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
    at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
    at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401)
    at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734)
    at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
    at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583)
    at org.apache.dolphinscheduler.server.master.runner.MasterSchedulerBootstrap.run(MasterSchedulerBootstrap.java:134)
Caused by: org.mybatis.spring.MyBatisSystemException: nested exception is org.apache.ibatis.exceptions.TooManyResultsException: Expected one result (or null) to be returned by selectOne(), but found: 2
    at org.mybatis.spring.MyBatisExceptionTranslator.translateExceptionIfPossible(MyBatisExceptionTranslator.java:96)
    at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:441)
    at com.sun.proxy.$Proxy134.selectOne(Unknown Source)
    at org.mybatis.spring.SqlSessionTemplate.selectOne(SqlSessionTemplate.java:160)
    at com.baomidou.mybatisplus.core.override.MybatisMapperMethod.execute(MybatisMapperMethod.java:89)
    at com.baomidou.mybatisplus.core.override.MybatisMapperProxy$PlainMethodInvoker.invoke(MybatisMapperProxy.java:148)
    at com.baomidou.mybatisplus.core.override.MybatisMapperProxy.invoke(MybatisMapperProxy.java:89)
    at com.sun.proxy.$Proxy175.queryByTypeAndJobId(Unknown Source)
    at org.apache.dolphinscheduler.service.process.TriggerRelationServiceImpl.queryByTypeAndJobId(TriggerRelationServiceImpl.java:50)
    at org.apache.dolphinscheduler.service.process.TriggerRelationServiceImpl.saveProcessInstanceTrigger(TriggerRelationServiceImpl.java:65)
    at org.apache.dolphinscheduler.service.process.ProcessServiceImpl.handleCommand(ProcessServiceImpl.java:342)
    at org.apache.dolphinscheduler.service.process.ProcessServiceImpl$$FastClassBySpringCGLIB$$9d3e18f9.invoke(<generated>)
    at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
    at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:793)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
    at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:763)
    at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:123)
    at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:388)
    at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:119)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
    at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:763)
    at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:708)
    at org.apache.dolphinscheduler.service.process.ProcessServiceImpl$$EnhancerBySpringCGLIB$$ad745090.handleCommand(<generated>)
    at org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteContextFactory.createWorkflowInstance(WorkflowExecuteContextFactory.java:81)
    at org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteContextFactory.createWorkflowExecuteRunnableContext(WorkflowExecuteContextFactory.java:56)
    at org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnableFactory.createWorkflowExecuteRunnable(WorkflowExecuteRunnableFactory.java:79)
    ... 15 common frames omitted
Caused by: org.apache.ibatis.exceptions.TooManyResultsException: Expected one result (or null) to be returned by selectOne(), but found: 2
    at org.apache.ibatis.session.defaults.DefaultSqlSession.selectOne(DefaultSqlSession.java:80)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:427)
    ... 39 common frames omitted
View Code

根据错误栈信息,可以锁定源码TriggerRelationServiceImpl第50行查询语句导致selectOne报错,sql查询信息如下:

 1     <sql id="baseSql">
 2         id, trigger_code, trigger_type, job_id, create_time, update_time
 3     </sql>
 4 
 5     <select id="queryByTypeAndJobId" resultType="org.apache.dolphinscheduler.dao.entity.TriggerRelation">
 6         select
 7         <include refid="baseSql"/>
 8         from t_ds_trigger_relation
 9         WHERE trigger_type = #{triggerType} and job_id =  #{jobId}
10     </select>
View Code

根据提供的源码sql,在元数据库中执行脚本,发现存在多条语句:

1 select * from (
2 select  trigger_type,job_id,count(1) as num 
3 from t_ds_trigger_relation group by trigger_type,job_id) base where num>1
View Code

1.备份t_ds_trigger_relation表的全量数据。
2.准备删除重复数据。
3.分析全量数据(调研阶段数据量较少)发现,虽然源码两个条件存在重复数据,但是表中的trigger_code字段都是唯一值。
4.谨慎起见,在不明逻辑的情况下,根据trigger_code信息,每组重复数据的每个trigger_code都试一下,是否可以运行工作流。
5.经过测试不管留哪个trigger_code,selectOne都会报错。
此处省略排查过程。
6.误打误撞情况下,直接清空t_ds_trigger_relation表数据,重启dolphinscheduler集群服务,重新运行工作流,无报错日志。

posted @ 2024-03-14 15:59  踩坑臭皮匠  阅读(907)  评论(0)    收藏  举报