@task.sensor( poke_interval=60, timeout=3600, mode='reschedule' ) def wait_for_upstream() -> PokeReturnValue: return PokeReturnValue(is_done=True, xcom_value='xcom_value')
Please help me understand this Airflow task.
Absolutely! Let’s break down this Airflow task step by step. Your example uses the TaskFlow API with a sensor task.
1️⃣ The Decorator: @task.sensor
@task.sensor( poke_interval=60, timeout=3600, mode='reschedule' )
Airflow 2.7+ allows sensor tasks as decorators. This is equivalent to using SensorOperator, but with TaskFlow syntax.
-
poke_interval=60-
How often the sensor checks (or “pokes”) the condition.
-
Here, every 60 seconds.
-
-
timeout=3600-
Maximum time to wait before failing the task.
-
Here, 3600 seconds = 1 hour.
-
-
mode='reschedule'-
Controls sensor behavior:
-
'poke'→ task occupies a worker slot the whole time. -
'reschedule'→ frees the worker between pokes, allowing other tasks to run.
-
-
'reschedule'is preferred for long-running sensors.
-
2️⃣ The Function: wait_for_upstream
def wait_for_upstream() -> PokeReturnValue: return PokeReturnValue(is_done=True, xcom_value='xcom_value')
-
This is the Python callable the sensor uses.
-
It must return a
PokeReturnValueobject.
PokeReturnValue has two key fields:
| Field | Purpose |
|---|---|
is_done |
Boolean indicating whether the sensor condition is met. True → sensor succeeds, False → keep waiting. |
xcom_value |
Optional value that will be pushed to XCom when the sensor succeeds. |
So here:
-
is_done=True→ sensor will succeed immediately. -
xcom_value='xcom_value'→ this string will be stored in XCom for downstream tasks.
3️⃣ What This Task Does
-
Airflow schedules the
wait_for_upstreamsensor. -
Every
poke_interval(60s), it calls the function to check if the condition is met. -
Since
is_done=Trueis returned immediately:-
The sensor completes on the first poke.
-
The value
'xcom_value'is stored in XCom.
-
-
If
is_done=Falsehad been returned:-
Airflow would wait up to
timeoutseconds, poking every 60 seconds.
-
-
Uses
'reschedule'mode → worker slot is freed between pokes.
4️⃣ XCom Usage
You can pull the XCom value in downstream tasks:
@task def downstream(value): print(f"Got XCom value: {value}") downstream(wait_for_upstream())
Output:
Got XCom value: xcom_value 5️⃣ Key Points
-
TaskFlow sensors are elegant wrappers around traditional SensorOperators.
-
poke_intervalandtimeoutcontrol how long and how often the sensor checks. -
mode='reschedule'helps with resource efficiency. -
Returning
PokeReturnValueallows custom logic and XCom pushing in one step.
🔹 Quick Analogy
Think of it like a door sensor:
-
poke_interval=60→ check the door every 60 seconds. -
timeout=3600→ give up after 1 hour if the door is still closed. -
is_done=True→ the door is open, you can go in. -
xcom_value→ leave a note inside the room for the next person.
If you want, I can also explain how to make this sensor actually wait for real conditions (like a file appearing, a database row existing, or an API returning success), instead of always succeeding immediately. This is a common real-world pattern.
Do you want me to do that?
You said:
ChatGPT said:
Ah, this is a subtle but important point about Airflow TaskFlow API! Let’s clarify exactly what’s happening.
1️⃣ TaskFlow XCom automatic handling
In the TaskFlow API (using @task or @task.sensor), you do not need to call xcom_pull() manually. Here’s why:
@task.sensor(...) def wait_for_upstream() -> PokeReturnValue: return PokeReturnValue(is_done=True, xcom_value='xcom_value') @task def downstream(value): print(f"Got XCom value: {value}") downstream(wait_for_upstream())
-
wait_for_upstream()returns aPokeReturnValuewithxcom_value='xcom_value'. -
When the task finishes, Airflow automatically pushes
xcom_valueto XCom. -
When you call
downstream(wait_for_upstream()), TaskFlow treats the function call as a dependency and automatically passes the XCom value asvalue.
✅ The TaskFlow API hides XCom pulling behind Python function calls. This is why you never explicitly see xcom_pull() here.
2️⃣ What would it look like with classic operators
If you weren’t using TaskFlow, you’d have to do something like:
from airflow.operators.python import PythonOperator def wait_for_upstream(ti): ti.xcom_push(key='value', value='xcom_value') def downstream(ti): value = ti.xcom_pull(task_ids='wait_task', key='value') print(value)
-
TaskFlow API is just syntactic sugar over this.
-
It automatically manages the XCom push/pull for you.
3️⃣ How TaskFlow passes the value internally
-
When you call
downstream(wait_for_upstream()), Python does not actually call the function immediately. -
Airflow creates a DAG dependency graph, and marks that the output of
wait_for_upstreamshould become the input ofdownstream. -
At runtime:
-
wait_for_upstreamexecutes and pushesxcom_valueto XCom. -
downstreamretrieves it automatically and passes it as thevalueargument.
-
4️⃣ Key Takeaways
| Classic Airflow | TaskFlow API |
|---|---|
Must push XCom manually (ti.xcom_push) |
Return a value (or xcom_value) and Airflow pushes it automatically |
Must pull XCom manually (ti.xcom_pull) |
Pass the output of one task into another as a Python function argument |
Functions look like def func(**kwargs) |
Functions look like normal Python functions |
So in your example, the xcom_value='xcom_value' is automatically passed as the value argument to downstream without ever explicitly calling xcom_pull().

浙公网安备 33010602011771号