公告

日历

Aspect	TaskSensor	TaskMarker
Purpose	Waits for another task to finish	Declares a dependency target for another DAG
Direction	Downstream waits on upstream	Upstream advertises itself
Is it a real task?	✅ Yes (runs, pokes, can wait)	❌ No (metadata-only marker)
Scheduling impact	Can block a DAG run	Does not run or wait
Typical use	Cross-DAG waiting	Cross-DAG dependency declaration

TaskSensor

What it is

A sensor that waits until a specific task in another DAG reaches a desired state.

Common examples:

ExternalTaskSensor
ExternalTaskMarker (paired with it)

What it does

Periodically checks the metadata DB
Waits for:
- Another DAG
- Another task
- A specific execution date
Blocks until condition is met (or times out)

Example

This means:

“Don’t run until upstream_dag.final_task succeeds.”

Characteristics

Consumes scheduler resources while waiting
Can slow down pipelines if misused
Enforces hard runtime dependency

TaskMarker

What it is

A metadata marker, not a runnable task.

Specifically:

ExternalTaskMarker

What it does

Declares:

“This task is depended on by another DAG”
Helps Airflow:
- Track cross-DAG lineage
- Clear tasks safely across DAG boundaries
- Avoid broken dependency chains

Example

This means:

“A task in downstream_dag depends on me.”

Characteristics

Does not run
Does not wait
Exists only for:
- Dependency visualization
- Clearing behavior
- DAG integrity

Why both exist (and why one isn’t enough)

Without TaskMarker

If you use only ExternalTaskSensor:

Clearing the upstream DAG won’t clear downstream DAGs
Lineage is invisible to Airflow
Risk of inconsistent state

With TaskMarker + TaskSensor

Airflow can:

Track dependencies in both directions
Automatically clear downstream tasks
Maintain DAG consistency

This is why Airflow recommends using both together.

Typical paired usage (recommended)

Upstream DAG

Downstream DAG

This creates a bi-directional, safe dependency.

Mental model

TaskSensor:

“I will WAIT until something happens elsewhere.”
TaskMarker:

“Someone else depends on me — remember that.”

Key takeaways

TaskSensor blocks execution
TaskMarker does not run
Sensors enforce runtime ordering
Markers enable safe dependency tracking
Use both for cross-DAG dependencies

If you want, I can also explain:

Why sensors can cause scheduler bottlenecks
Alternatives like Dataset-based scheduling
How clearing behaves with and without markers
When you should not use cross-DAG sensors

You said 'If you use only ExternalTaskSensor, clearing the upstream DAG won’t clear downstream DAGs'. What does clearing mean?

Great follow-up — “clearing” is a very specific (and very important) concept in Airflow.

What does “clearing” mean in Airflow?

Clearing a task or DAG run means:

Resetting task instance state so Airflow will run it again

When you clear something, Airflow:

Sets the task instance state back to none
Keeps the DAG run and execution date
Allows the scheduler to re-execute the task

Clearing is not deleting.

How clearing is used in practice

Clearing is commonly used when:

A task failed and you fixed the issue
Upstream data changed
You want to re-run part of a pipeline
You want to re-process historical data

Example (UI):

Select a task → Clear
Select a DAG run → Clear DAG Run
Use “Downstream / Recursive” options

Clearing vs deleting (important difference)

Action	What happens
Clear task	Task state reset → runs again
Clear DAG run	All tasks reset → runs again
Delete DAG run	DAG run removed permanently

Clearing with cross-DAG dependencies

This is where ExternalTaskSensor vs ExternalTaskMarker matters.

Scenario

DAG A (upstream)
DAG B (downstream)
DAG B waits on DAG A using only ExternalTaskSensor

What happens when you clear DAG A?

DAG A tasks are cleared and re-run
DAG B:
- Does not get cleared
- Still thinks it already ran successfully
- Becomes out of sync

This creates a data consistency problem:

DAG A produces new data
DAG B still has old results

How `ExternalTaskMarker` fixes this

When DAG A uses an ExternalTaskMarker:

Airflow now knows:

“This task is depended on by another DAG.”

So when you clear DAG A:

Airflow automatically finds dependent tasks in DAG B
Clears them too (if you choose recursive clearing)

This keeps pipelines consistent.

What “recursive clearing” means

Recursive clearing means:

“Clear this task and everything that depends on it — even across DAGs.”

This only works if:

Dependencies are explicitly declared
That’s what ExternalTaskMarker provides

Visual example

Without marker

Clear A1:

A1 reruns
B1 remains success ❌

With marker + sensor

Clear A1:

A1 reruns
B1 is cleared too ✅

Why this matters in production

Without proper clearing:

Downstream DAGs can silently use stale data
Backfills become unsafe
Debugging becomes very difficult

This is why Airflow strongly recommends:

Always pair ExternalTaskSensor with ExternalTaskMarker.

Key takeaway

Clearing = resetting task state to rerun
It is essential for recovery and reprocessing
Sensors alone do not propagate clears
Markers enable safe, recursive clearing across DAGs

from __future__ import annotations

import pendulum

from airflow.providers.standard.operators.empty import EmptyOperator
from airflow.providers.standard.sensors.external_task import ExternalTaskMarker, ExternalTaskSensor
from airflow.sdk import DAG

start_date = pendulum.datetime(2021, 1, 1, tz="UTC")

with DAG(
    dag_id="example_external_task_marker_parent",
    start_date=start_date,
    catchup=False,
    schedule=None,
    tags=["example2"],
) as parent_dag:
    # [START howto_operator_external_task_marker]
    parent_task = ExternalTaskMarker(
        task_id="parent_task",
        external_dag_id="example_external_task_marker_child",
        external_task_id="child_task1",
    )
    # [END howto_operator_external_task_marker]

with DAG(
    dag_id="example_external_task_marker_child",
    start_date=start_date,
    schedule=None,
    catchup=False,
    tags=["example2"],
) as child_dag:
    # [START howto_operator_external_task_sensor]
    child_task1 = ExternalTaskSensor(
        task_id="child_task1",
        external_dag_id=parent_dag.dag_id,
        external_task_id=parent_task.task_id,
        timeout=600,
        allowed_states=["success"],
        failed_states=["failed", "skipped"],
        mode="reschedule",
    )
    # [END howto_operator_external_task_sensor]

    # [START howto_operator_external_task_sensor_with_task_group]
    child_task2 = ExternalTaskSensor(
        task_id="child_task2",
        external_dag_id=parent_dag.dag_id,
        external_task_group_id="parent_dag_task_group_id",
        timeout=600,
        allowed_states=["success"],
        failed_states=["failed", "skipped"],
        mode="reschedule",
    )
    # [END howto_operator_external_task_sensor_with_task_group]

    child_task3 = EmptyOperator(task_id="child_task3")
    child_task1 >> child_task2 >> child_task3

For the above code, I manually ran the two DAGs, the task parent_task of the first DAG succeeded, the state of the task child_task1 in the second DAG became 'Up for Reschedule', why?

Key concepts that explain everything

Before looking at each task, there are three important Airflow behaviors involved:

1. `ExternalTaskSensor` + execution dates

An ExternalTaskSensor does not just look for a successful task in another DAG.
It looks for a task with the same logical date (execution date).

If the parent DAG and child DAG were manually triggered at different times, their logical dates will not match → the sensor will never find the external task.

This is the most common source of confusion with ExternalTaskSensor.

2. `mode='reschedule'`

When a sensor runs in reschedule mode:

It checks once
If the condition is not met:
- It releases the worker
- Its state becomes Up for Reschedule
- It will try again later until timeout is reached

This is not a failure.

How to make the logical date same when manually ran the two DAGs?

To make the logical date (execution date) the same when manually running two DAGs, you must explicitly control the logical date at trigger time. Simply clicking “Trigger DAG” in the UI is not enough, because each click gets its own logical date.

Below are the correct and supported ways to do this.

✅ Best Practice: Trigger one DAG from the other (recommended)

Use TriggerDagRunOperator so Airflow automatically propagates the logical date.

Parent DAG

✔ The child DAG run will have the exact same logical date
✔ This is the safest and cleanest solution
✔ Works perfectly with ExternalTaskSensor

✅ Manual Trigger via Airflow UI (advanced but works)

When triggering manually in the UI:

Click Trigger DAG
Open Advanced options
Set Logical date explicitly
Example:
```
2025-01-01T00:00:00+00:00
```
Trigger both DAGs using the exact same value

⚠️ Even a one-second difference will break ExternalTaskSensor.

✅ Trigger via CLI (most precise manual method)

✔ Guaranteed match
✔ Ideal for testing

🚫 What does NOT work

Method	Why
Clicking “Trigger DAG” twice	Each click gets a different logical date
Relying on start_date	Only used for scheduling, not manual runs
`execution_delta`	Still requires aligned logical dates
Sensors without date control	Will wait forever

🔎 How to verify logical dates match

In the Airflow UI:

Open Graph View
Click the DAG run
Check Logical date

Or via CLI:

🧠 Key mental model

ExternalTaskSensor waits for:

Same external_dag_id

Same external_task_id

Same logical date

Desired task state

If any of those differ → sensor waits.

posted on 2025-12-16 11:24 ZhangZhihuiAAA 阅读(2) 评论(0) 收藏举报

刷新页面返回顶部


博客园 © 2004-2025 浙公网安备 33010602011771号浙ICP备2021040463号-3

导航

High-level difference

TaskSensor

What it is

What it does

Example

Characteristics

TaskMarker

What it is

What it does

Example

Characteristics

Why both exist (and why one isn’t enough)

Without TaskMarker

With TaskMarker + TaskSensor

Typical paired usage (recommended)

Upstream DAG

Downstream DAG

Mental model

Key takeaways

What does “clearing” mean in Airflow?

How clearing is used in practice

Clearing vs deleting (important difference)

Clearing with cross-DAG dependencies

Scenario

What happens when you clear DAG A?

How ExternalTaskMarker fixes this

What “recursive clearing” means

Visual example

Without marker

With marker + sensor

Why this matters in production

Key takeaway

Key concepts that explain everything

1. ExternalTaskSensor + execution dates

2. mode='reschedule'

✅ Best Practice: Trigger one DAG from the other (recommended)

Parent DAG

✅ Manual Trigger via Airflow UI (advanced but works)

✅ Trigger via CLI (most precise manual method)

🚫 What does NOT work

🔎 How to verify logical dates match

🧠 Key mental model

How `ExternalTaskMarker` fixes this

1. `ExternalTaskSensor` + execution dates

2. `mode='reschedule'`