dolphinscheduler无法启动工作流
dolphinscheduler版本:3.2.0
情景复现:
View Code
View Code
View Code
情景复现:
- 当操作dolphinscheduler web ui任意删除按钮功能时。
- dolphinscheduler web ui页面卡顿的情况下,连续点击工作流运行按钮(疑似并发)。
master节点报错日志如下:

[ERROR] 2024-03-13 12:16:07.468 +0800 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerBootstrap:[154] - [WorkflowInstance-0][TaskInstance-0] - Master handle command 1 error org.apache.dolphinscheduler.server.master.exception.WorkflowCreateException: Create workflow execute runnable failed at org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnableFactory.createWorkflowExecuteRunnable(WorkflowExecuteRunnableFactory.java:93) at org.apache.dolphinscheduler.server.master.runner.MasterSchedulerBootstrap.lambda$run$0(MasterSchedulerBootstrap.java:137) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583) at org.apache.dolphinscheduler.server.master.runner.MasterSchedulerBootstrap.run(MasterSchedulerBootstrap.java:134) Caused by: org.mybatis.spring.MyBatisSystemException: nested exception is org.apache.ibatis.exceptions.TooManyResultsException: Expected one result (or null) to be returned by selectOne(), but found: 2 at org.mybatis.spring.MyBatisExceptionTranslator.translateExceptionIfPossible(MyBatisExceptionTranslator.java:96) at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:441) at com.sun.proxy.$Proxy134.selectOne(Unknown Source) at org.mybatis.spring.SqlSessionTemplate.selectOne(SqlSessionTemplate.java:160) at com.baomidou.mybatisplus.core.override.MybatisMapperMethod.execute(MybatisMapperMethod.java:89) at com.baomidou.mybatisplus.core.override.MybatisMapperProxy$PlainMethodInvoker.invoke(MybatisMapperProxy.java:148) at com.baomidou.mybatisplus.core.override.MybatisMapperProxy.invoke(MybatisMapperProxy.java:89) at com.sun.proxy.$Proxy175.queryByTypeAndJobId(Unknown Source) at org.apache.dolphinscheduler.service.process.TriggerRelationServiceImpl.queryByTypeAndJobId(TriggerRelationServiceImpl.java:50) at org.apache.dolphinscheduler.service.process.TriggerRelationServiceImpl.saveProcessInstanceTrigger(TriggerRelationServiceImpl.java:65) at org.apache.dolphinscheduler.service.process.ProcessServiceImpl.handleCommand(ProcessServiceImpl.java:342) at org.apache.dolphinscheduler.service.process.ProcessServiceImpl$$FastClassBySpringCGLIB$$9d3e18f9.invoke(<generated>) at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:793) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:763) at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:123) at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:388) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:119) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:763) at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:708) at org.apache.dolphinscheduler.service.process.ProcessServiceImpl$$EnhancerBySpringCGLIB$$ad745090.handleCommand(<generated>) at org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteContextFactory.createWorkflowInstance(WorkflowExecuteContextFactory.java:81) at org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteContextFactory.createWorkflowExecuteRunnableContext(WorkflowExecuteContextFactory.java:56) at org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnableFactory.createWorkflowExecuteRunnable(WorkflowExecuteRunnableFactory.java:79) ... 15 common frames omitted Caused by: org.apache.ibatis.exceptions.TooManyResultsException: Expected one result (or null) to be returned by selectOne(), but found: 2 at org.apache.ibatis.session.defaults.DefaultSqlSession.selectOne(DefaultSqlSession.java:80) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:427) ... 39 common frames omitted
根据错误栈信息,可以锁定源码TriggerRelationServiceImpl第50行查询语句导致selectOne报错,sql查询信息如下:

1 <sql id="baseSql"> 2 id, trigger_code, trigger_type, job_id, create_time, update_time 3 </sql> 4 5 <select id="queryByTypeAndJobId" resultType="org.apache.dolphinscheduler.dao.entity.TriggerRelation"> 6 select 7 <include refid="baseSql"/> 8 from t_ds_trigger_relation 9 WHERE trigger_type = #{triggerType} and job_id = #{jobId} 10 </select>
根据提供的源码sql,在元数据库中执行脚本,发现存在多条语句:

1 select * from ( 2 select trigger_type,job_id,count(1) as num 3 from t_ds_trigger_relation group by trigger_type,job_id) base where num>1
1.备份t_ds_trigger_relation表的全量数据。
2.准备删除重复数据。
3.分析全量数据(调研阶段数据量较少)发现,虽然源码两个条件存在重复数据,但是表中的trigger_code字段都是唯一值。
4.谨慎起见,在不明逻辑的情况下,根据trigger_code信息,每组重复数据的每个trigger_code都试一下,是否可以运行工作流。
5.经过测试不管留哪个trigger_code,selectOne都会报错。
此处省略排查过程。
6.误打误撞情况下,直接清空t_ds_trigger_relation表数据,重启dolphinscheduler集群服务,重新运行工作流,无报错日志。