ZhangZhihui's Blog  

 

"""
Example DAG demonstrating the usage of ``@task.branch`` TaskFlow API decorator with depends_on_past=True,
where tasks may be run or skipped on alternating runs.
"""

from __future__ import annotations

import pendulum

from airflow.providers.standard.operators.empty import EmptyOperator
from airflow.sdk import DAG, task


@task.branch()
def should_run(**kwargs) -> str:
    """
    Determine which empty_task should be run based on if the logical date minute is even or odd.
    
    :param dict kwargs: Context
    :return: Id of the task to run
    :rtype: str
    """
    print(f"--------------exec dttm = {kwargs['logical_date']} and minute = {kwargs['logical_date'].minute}")
    if kwargs['logical_date'].minute % 2 == 1:
        return 'empty_task_1'
    return 'empty_task_2'


with DAG(
    dag_id='example_branch_dop_operator_v3',
    schedule='*/1 * * * *',
    start_date=pendulum.datetime(2021, 1, 1, tz='UTC'),
    catchup=False,
    default_args={'depends_on_past': True},
    tags=['example']
) as dag:
    cond = should_run()

    empty_task_1 = EmptyOperator(task_id='empty_task_1')
    empty_task_2 = EmptyOperator(task_id='empty_task_2')
    cond >> [empty_task_1, empty_task_2]

In the above code snippet, what's the purpose of {'depends_on_past': True} ?

 

In Airflow, setting depends_on_past=True in default_args means:

Each task instance will wait for the previous run’s instance of the same task to succeed before it can run.

So for this DAG:

  • empty_task_1 on 2021-01-01 00:05 will only run if empty_task_1 on 2021-01-01 00:04 succeeded.

  • empty_task_2 on 2021-01-01 00:06 will only run if empty_task_2 on 2021-01-01 00:05 succeeded.

Why this matters with Branching

You are using a branching task (@task.branch) that returns either:

  • 'empty_task_1' or

  • 'empty_task_2'

Because of the schedule (*/1 * * * *), the DAG runs every minute.
Some minutes return branch A, others return branch B.

❗ Problem created by depends_on_past=True

If at minute 05 Airflow picks empty_task_1, and at minute 06 it picks empty_task_2, then:

  • At minute 06, Airflow tries to run empty_task_2.

  • But empty_task_2 did not run at minute 05 (it was skipped due to branching).

  • Therefore the minute-06 instance depends on a skipped previous instance.

  • Airflow treats skipped as not satisfying depends_on_past, so the run gets stuck.

Summary

depends_on_past=True enforces sequential dependency between runs of each task, which is often incompatible with branching unless carefully controlled.

In short:

It forces each task to wait for the previous run of itself, which can break a branching DAG because only one branch runs per execution.

 
You said:
I manually ran the DAG, the result was not as you said. Task empty_task_1 ran at each odd minute and task empty_task_2 ran at each even minute, neither of them got stuck.
 
ChatGPT said:

You're right to question it — because in your specific DAG, depends_on_past=True does not cause stuck tasks, even though it can in other branching patterns.

Let’s clarify why your case works fine.


✅ Why depends_on_past=True does NOT block your branching DAG

Key point:

Airflow considers a skipped task instance as a successful task for the purpose of depends_on_past.

Airflow’s rule:

depends_on_past=True requires the previous task instance to be in a successful, skipped, or upstream_skipped state.
It only blocks if the previous instance is failed, up_for_retry, or missing.

This is documented in Airflow’s scheduling behavior.


🔍 What happens in your DAG

Your branching task (should_run) chooses one downstream task:

  • Odd minute → run empty_task_1, skip empty_task_2

  • Even minute → run empty_task_2, skip empty_task_1

Thus each minute, one task runs, and the other is cleanly skipped.

Example timeline for empty_task_1

MinuteBranch resultempty_task_1 state
00 even skipped
01 odd success
02 even skipped
03 odd success

Now apply the rule:
A task instance may run if its previous instance is "successful" OR "skipped".

So at 01, Airflow checks minute 00:

  • Previous state = skipped → ✔ OK

At 03, it checks minute 02:

  • Previous state = skipped → ✔ OK

Same logic applies to empty_task_2.


🎯 The result:

Everything runs normally:

  • empty_task_1 runs every odd minute

  • empty_task_2 runs every even minute

  • None are blocked by depends_on_past=True

This matches what you observed.


📝 Summary — why it works

depends_on_past=True only blocks runs if the previous instance is failed or missing.
Branching produces skipped, and Airflow treats that as a valid state.

So in your DAG:

✔ Tasks run on their designated minutes
✔ No task gets stuck
depends_on_past=True enforces run order, but doesn’t conflict with branching because “skipped” is permitted

 

1

 

posted on 2025-12-11 11:09  ZhangZhihuiAAA  阅读(3)  评论(0)    收藏  举报