ZhangZhihui's Blog  

 

"""
Example DAG demonstrating the usage of dynamic task mapping with non-TaskFlow operators.
"""

from __future__ import annotations

from datetime import datetime

from airflow.sdk import DAG, BaseOperator


class AddOneOperator(BaseOperator):
    """A custom operator that adds one to the input."""

    def __init__(self, value, **kwargs):
        super().__init__(**kwargs)
        self.value = value
    
    def execute(self, context):
        return self.value + 1


class SumItOperator(BaseOperator):
    """A custom operator that sums the input."""

    template_fields = ('values',)

    def __init__(self, values, **kwargs):
        super().__init__(**kwargs)
        self.values = values
    
    def execute(self, context):
        total = sum(self.values)
        print(f'Total waas {total}')
        return total


with DAG(
    dag_id='example_dynamic_task_mapping_with_no_taskflow_operators',
    start_date=datetime(2022, 3, 4),
    catchup=False,
    schedule=None
):
    add_one_task = AddOneOperator.partial(task_id='add_one').expand(value=[1, 2, 3])

    sum_it_task = SumItOperator(task_id='sum_it', values=add_one_task.output)

In the above code, there is 'add_one_task.output'. I'd like to know if every task has the 'output' attribute or only the dynamic mapping task has that attribute.

 

Short answer:
output is not specific to dynamic task mapping.
In modern Airflow, every operator task has an .output attribute, but it becomes especially important and visible when used with dynamic task mapping.

Let’s unpack this carefully.


What task.output actually is

task.output is an XComArg object.

  • It represents the return value of a task

  • It is a reference, not the actual value

  • Airflow resolves it at runtime

This applies to:

  • TaskFlow tasks (@task)

  • Classic operators (BaseOperator)

  • Mapped tasks

  • Non-mapped tasks

So yes: every task has .output, not just dynamically mapped ones.


Why it exists for classic operators

Historically, only TaskFlow tasks had a clean return-value interface.
Starting in Airflow 2.x, operators gained .output to unify behavior.

For example:

t1 = BashOperator(
    task_id="t1",
    bash_command="echo 5",
    do_xcom_push=True
)

t2 = PythonOperator(
    task_id="t2",
    python_callable=lambda x: print(x),
    op_args=[t1.output]
)

Here, t1.output is a valid XCom reference even though no mapping is involved.


What makes mapped tasks special

Dynamic mapping changes what .output represents, not whether it exists.

Non-mapped task

t = AddOneOperator(task_id="add_one", value=1)
  • t.output → a single XCom value

  • Example: 2


Mapped task

t = AddOneOperator.partial(task_id="add_one").expand(value=[1, 2, 3])
  • t.output → a mapped XCom

  • Conceptually:

    [2, 3, 4]
  • Airflow automatically aggregates mapped outputs when passed downstream

This is why sum_it_task can receive values=add_one_task.output as a list.


Why SumItOperator needs template_fields

template_fields = ('values',)

Because:

  • values receives an XComArg

  • Airflow must render it at runtime

  • Templating tells Airflow:
    “This field may contain runtime-resolved values.”

This is unrelated to mapping but often seen together.


Summary table

Scenario.output exists?What it represents
TaskFlow task Return value
Classic operator Return value
Mapped task List of mapped outputs
Non-mapped task Single output
Task with no return None

Key takeaway

    • Every task has .output

    • Dynamic mapping only changes the shape of the output

    • .output is always an XComArg

    • Mapping makes it a collection instead of a single value

 

Please help me understand template_fields = ('values',) . What are template fields? Can the 'values' be other string?
 

Great question — this is a core Airflow concept that often feels mysterious at first.


What are template_fields?

In Airflow, template fields are operator attributes whose values are rendered at runtime using Airflow’s templating engine (Jinja).

In simple terms:

A template field tells Airflow:
“This attribute may contain values that must be resolved when the task actually runs.”

This includes:

  • Jinja expressions like {{ ds }}

  • XComArg references (like add_one_task.output)

  • Values that depend on upstream tasks


Why template fields exist

Airflow parses DAGs before tasks run.
At parse time:

  • Task outputs do not exist yet

  • Execution dates are unknown

  • Mapped task results are unknown

So Airflow needs a mechanism to delay evaluation until runtime.

That mechanism is template_fields.


Your specific example

class SumItOperator(BaseOperator):
    template_fields = ('values',)

    def __init__(self, values, **kwargs):
        self.values = values

Here’s what happens:

During DAG parsing

values = add_one_task.output
  • values is an XComArg

  • It does NOT contain real data yet

  • Airflow stores it as a placeholder

At task execution time

Airflow:

  1. Looks at template_fields

  2. Finds "values"

  3. Renders it using the execution context

  4. Replaces values with the actual list:

    [2, 3, 4]

Now execute() runs with real data.


What happens if you don’t declare template_fields

If you remove this line:

template_fields = ('values',)

Then inside execute():

sum(self.values)

You will get something like:

TypeError: 'XComArg' object is not iterable

Because Airflow never resolved it.


Can 'values' be any string?

Yes — the string must match the attribute name on the operator.

These are all valid examples:

template_fields = ('values',)
template_fields = ('input_data',)
template_fields = ('numbers', 'threshold')

But this must be true:

self.<name> = ...

Example

class MyOperator(BaseOperator):
    template_fields = ('numbers',)

    def __init__(self, numbers, **kwargs):
        self.numbers = numbers

✅ Works

template_fields = ('values',)  # but no self.values

❌ Does nothing


Template fields are not variable names

This is important:

template_fields = ('values',)
  • ❌ Not a Python variable

  • ❌ Not a parameter name

  • ✅ It is the attribute name on the operator instance


What kinds of values can be templated?

Template fields can contain:

1. Jinja templates

bash_command="echo {{ ds }}"

2. XComArgs

values=add_one_task.output

3. Lists and dicts containing templates

values=["{{ ds }}", "{{ dag_run.run_id }}"]

Airflow recursively renders them.


How Airflow uses template fields internally

At runtime, Airflow effectively does:

for field in operator.template_fields:
    operator.field = render(operator.field, context)

This happens before execute() is called.


Key differences from @task (TaskFlow API)

With @task:

  • Airflow automatically templates inputs

  • You don’t need template_fields

With classic operators:

  • You must declare them explicitly

That’s why your custom operator needs it.


Summary

  • Template fields are operator attributes rendered at runtime

  • They allow access to:

    • XCom outputs

    • Execution context

    • Jinja macros

  • 'values' can be any attribute name

  • It must match an attribute on the operator

  • Required for classic operators, optional for TaskFlow

 

Is template_fields a fixed name or I can use other variable name?
 

template_fields is a fixed, special class attribute name in Airflow.

You cannot rename it.


Why template_fields must be exactly this name

Airflow’s operator machinery explicitly looks for an attribute called template_fields on each operator class.

Internally, Airflow does something equivalent to:

fields = operator.template_fields
for field_name in fields:
    render(operator.<field_name>)

If you use any other name, Airflow will ignore it completely.


What can change vs what cannot

❌ Cannot change

template_fields   # must be this exact name

These will NOT work:

templated_fields
template_field
templates
render_fields

✅ Can change

The contents of template_fields:

template_fields = ('values',)
template_fields = ('numbers', 'threshold')
template_fields = ('input', 'output_path')

As long as:

  • Each string matches an attribute on the operator

  • Those attributes exist at runtime


Example of correct usage

 
class MyOperator(BaseOperator): template_fields = ('input_data',) def __init__(self, input_data, **kwargs): super().__init__(**kwargs) self.input_data = input_data

Airflow will template self.input_data.


Example of incorrect usage

class MyOperator(BaseOperator):
    templated_fields = ('values',)  # ❌ ignored

    def __init__(self, values, **kwargs):
        self.values = values

Result:

  • values will NOT be rendered

  • XComArgs will remain unresolved

  • Task will fail at runtime


Mental model

Think of template_fields as a reserved hook:

“If this attribute exists, Airflow will use it.
If not, nothing happens.”


Final takeaway

  • template_fields is a reserved, fixed name

  • You must use it exactly

  • Only the field names inside it are customizable

 

posted on 2025-12-16 10:07  ZhangZhihuiAAA  阅读(7)  评论(0)    收藏  举报