ZhangZhihui's Blog  

 

    @task.sensor(
        poke_interval=60,
        timeout=3600,
        mode='reschedule'
    )
    def wait_for_upstream() -> PokeReturnValue:
        return PokeReturnValue(is_done=True, xcom_value='xcom_value')

Please help me understand this Airflow task.

 

Absolutely! Let’s break down this Airflow task step by step. Your example uses the TaskFlow API with a sensor task.


1️⃣ The Decorator: @task.sensor

@task.sensor(
    poke_interval=60,
    timeout=3600,
    mode='reschedule'
)

Airflow 2.7+ allows sensor tasks as decorators. This is equivalent to using SensorOperator, but with TaskFlow syntax.

  • poke_interval=60

    • How often the sensor checks (or “pokes”) the condition.

    • Here, every 60 seconds.

  • timeout=3600

    • Maximum time to wait before failing the task.

    • Here, 3600 seconds = 1 hour.

  • mode='reschedule'

    • Controls sensor behavior:

      • 'poke' → task occupies a worker slot the whole time.

      • 'reschedule' → frees the worker between pokes, allowing other tasks to run.

    • 'reschedule' is preferred for long-running sensors.


2️⃣ The Function: wait_for_upstream

def wait_for_upstream() -> PokeReturnValue:
    return PokeReturnValue(is_done=True, xcom_value='xcom_value')
  • This is the Python callable the sensor uses.

  • It must return a PokeReturnValue object.

PokeReturnValue has two key fields:

FieldPurpose
is_done Boolean indicating whether the sensor condition is met. True → sensor succeeds, False → keep waiting.
xcom_value Optional value that will be pushed to XCom when the sensor succeeds.

So here:

  • is_done=True → sensor will succeed immediately.

  • xcom_value='xcom_value' → this string will be stored in XCom for downstream tasks.


3️⃣ What This Task Does

  1. Airflow schedules the wait_for_upstream sensor.

  2. Every poke_interval (60s), it calls the function to check if the condition is met.

  3. Since is_done=True is returned immediately:

    • The sensor completes on the first poke.

    • The value 'xcom_value' is stored in XCom.

  4. If is_done=False had been returned:

    • Airflow would wait up to timeout seconds, poking every 60 seconds.

  5. Uses 'reschedule' mode → worker slot is freed between pokes.


4️⃣ XCom Usage

You can pull the XCom value in downstream tasks:

@task
def downstream(value):
    print(f"Got XCom value: {value}")

downstream(wait_for_upstream())

Output:

Got XCom value: xcom_value

5️⃣ Key Points

  • TaskFlow sensors are elegant wrappers around traditional SensorOperators.

  • poke_interval and timeout control how long and how often the sensor checks.

  • mode='reschedule' helps with resource efficiency.

  • Returning PokeReturnValue allows custom logic and XCom pushing in one step.


🔹 Quick Analogy

Think of it like a door sensor:

  • poke_interval=60 → check the door every 60 seconds.

  • timeout=3600 → give up after 1 hour if the door is still closed.

  • is_done=True → the door is open, you can go in.

  • xcom_value → leave a note inside the room for the next person.


If you want, I can also explain how to make this sensor actually wait for real conditions (like a file appearing, a database row existing, or an API returning success), instead of always succeeding immediately. This is a common real-world pattern.

Do you want me to do that?

 
You said:
In the XCom usage example code you gave, I didn't see xcom.pull_value(), so how the value 'xcom_value' was got?
 
ChatGPT said:

Ah, this is a subtle but important point about Airflow TaskFlow API! Let’s clarify exactly what’s happening.


1️⃣ TaskFlow XCom automatic handling

In the TaskFlow API (using @task or @task.sensor), you do not need to call xcom_pull() manually. Here’s why:

@task.sensor(...)
def wait_for_upstream() -> PokeReturnValue:
    return PokeReturnValue(is_done=True, xcom_value='xcom_value')

@task
def downstream(value):
    print(f"Got XCom value: {value}")

downstream(wait_for_upstream())
  • wait_for_upstream() returns a PokeReturnValue with xcom_value='xcom_value'.

  • When the task finishes, Airflow automatically pushes xcom_value to XCom.

  • When you call downstream(wait_for_upstream()), TaskFlow treats the function call as a dependency and automatically passes the XCom value as value.

✅ The TaskFlow API hides XCom pulling behind Python function calls. This is why you never explicitly see xcom_pull() here.


2️⃣ What would it look like with classic operators

If you weren’t using TaskFlow, you’d have to do something like:

from airflow.operators.python import PythonOperator

def wait_for_upstream(ti):
    ti.xcom_push(key='value', value='xcom_value')

def downstream(ti):
    value = ti.xcom_pull(task_ids='wait_task', key='value')
    print(value)
  • TaskFlow API is just syntactic sugar over this.

  • It automatically manages the XCom push/pull for you.


3️⃣ How TaskFlow passes the value internally

  • When you call downstream(wait_for_upstream()), Python does not actually call the function immediately.

  • Airflow creates a DAG dependency graph, and marks that the output of wait_for_upstream should become the input of downstream.

  • At runtime:

    1. wait_for_upstream executes and pushes xcom_value to XCom.

    2. downstream retrieves it automatically and passes it as the value argument.


4️⃣ Key Takeaways

Classic AirflowTaskFlow API
Must push XCom manually (ti.xcom_push) Return a value (or xcom_value) and Airflow pushes it automatically
Must pull XCom manually (ti.xcom_pull) Pass the output of one task into another as a Python function argument
Functions look like def func(**kwargs) Functions look like normal Python functions

So in your example, the xcom_value='xcom_value' is automatically passed as the value argument to downstream without ever explicitly calling xcom_pull().

 

posted on 2025-12-12 11:49  ZhangZhihuiAAA  阅读(0)  评论(0)    收藏  举报