公告

日历

#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
"""Example DAG demonstrating the usage DAG params to model a trigger UI with a user form.

This example DAG generates greetings to a list of provided names in selected languages in the logs.
"""

from __future__ import annotations

import datetime
from pathlib import Path

from airflow.sdk import DAG, Param, TriggerRule, task

# [START params_trigger]
with DAG(
    dag_id=Path(__file__).stem,
    dag_display_name="Params Trigger UI",
    description=__doc__.partition(".")[0],
    doc_md=__doc__,
    schedule=None,
    start_date=datetime.datetime(2022, 3, 4),
    catchup=False,
    tags=["example", "params"],
    params={
        "names": Param(
            ["Linda", "Martha", "Thomas"],
            type="array",
            description="Define the list of names for which greetings should be generated in the logs."
            " Please have one name per line.",
            title="Names to greet",
        ),
        "english": Param(True, type="boolean", title="English"),
        "german": Param(True, type="boolean", title="German (Formal)"),
        "french": Param(True, type="boolean", title="French"),
    },
) as dag:

    @task(task_id="get_names", task_display_name="Get names")
    def get_names(**kwargs) -> list[str]:
        params = kwargs["params"]
        if "names" not in params:
            print("Uuups, no names given, was no UI used to trigger?")
            return []
        return params["names"]

    @task.branch(task_id="select_languages", task_display_name="Select languages")
    def select_languages(**kwargs) -> list[str]:
        params = kwargs["params"]
        selected_languages = []
        for lang in ["english", "german", "french"]:
            if params[lang]:
                selected_languages.append(f"generate_{lang}_greeting")
        return selected_languages

    @task(task_id="generate_english_greeting", task_display_name="Generate English greeting")
    def generate_english_greeting(name: str) -> str:
        return f"Hello {name}!"

    @task(task_id="generate_german_greeting", task_display_name="Erzeuge Deutsche Begrüßung")
    def generate_german_greeting(name: str) -> str:
        return f"Sehr geehrter Herr/Frau {name}."

    @task(task_id="generate_french_greeting", task_display_name="Produire un message d'accueil en français")
    def generate_french_greeting(name: str) -> str:
        return f"Bonjour {name}!"

    @task(task_id="print_greetings", task_display_name="Print greetings", trigger_rule=TriggerRule.ALL_DONE)
    def print_greetings(greetings1, greetings2, greetings3) -> None:
        for g in greetings1 or []:
            print(g)
        for g in greetings2 or []:
            print(g)
        for g in greetings3 or []:
            print(g)
        if not (greetings1 or greetings2 or greetings3):
            print("sad, nobody to greet :-(")

    lang_select = select_languages()
    names = get_names()
    english_greetings = generate_english_greeting.expand(name=names)
    german_greetings = generate_german_greeting.expand(name=names)
    french_greetings = generate_french_greeting.expand(name=names)
    lang_select >> [english_greetings, german_greetings, french_greetings]
    results_print = print_greetings(english_greetings, german_greetings, french_greetings)
# [END params_trigger]

一个task作为参数传入另一个task，则会自动成为其依赖。

In Apache Airflow, expand() is part of the TaskFlow API’s dynamic task mapping feature.
It is not a method on a Task instance (TaskInstance), but rather a method on a TaskFlow-decorated task function or operator object used before the task is scheduled.

✅ What `expand()` Does

expand() allows you to dynamically create multiple task instances at runtime from a single task definition.

Instead of writing:

Why it exists

Dynamic task mapping lets you create tasks based on data that may not be known until the DAG run starts—e.g., list of files, API responses, partitions, etc.

🔍 How `expand()` Works

1. You write a task using the TaskFlow API

2. You call expand() at DAG parse time

3. At runtime, Airflow generates task instances:

process[0] → value = 10
process[1] → value = 20
process[2] → value = 30

These appear in the Airflow UI as mapped tasks.

🧩 How `expand()` differs from `partial()`

`partial()`

Sets fixed arguments:

`expand()`

Provides arguments that vary per mapped task:

You can combine them:

🔧 Example With `partial() + expand()`

You often want some arguments fixed and others dynamic:

upload_to_s3[0] → bucket="my-data", filename="a.csv"
upload_to_s3[1] → bucket="my-data", filename="b.csv"
upload_to_s3[2] → bucket="my-data", filename="c.csv"

partial() and expand() are Airflow-provided methods, not built-in Python functions.

Be careful not to confuse:

Airflow’s partial() (for fixing operator args)
Python’s functools.partial() (for currying any callable)

Airflow intentionally uses the same name because the idea (pre-filling fixed arguments) is similar, but the implementation is completely different.

In the above code snippet, why not use generate_english_greeting(name=names) directly instead of generate_english_greeting.expand(name=names) ?

Great question — and it's a core concept of Airflow’s TaskFlow dynamic task mapping.

✅ Short Answer

Because:

➤ `generate_english_greeting(name=names)`

would pass the entire list of names as one argument, producing one task instance.

➤ `generate_english_greeting.expand(name=names)`

creates one mapped task instance per name, producing multiple parallel tasks, one per item in the list.

✅ Detailed Explanation

Suppose `names` is:

❌ If you write:

Then Airflow treats names as a single list argument, and the function receives:

So inside the task, name would be that entire list.

You would get 1 task instance:

And it prints something like:

Not what you want.

✅ When you write:

Airflow dynamically maps the task and produces 3 separate task instances:

Each task prints a greeting for a single name.

🧠 Why dynamic task mapping matters here

Dynamic task mapping is used because:

The user can provide any number of names from the UI form
You want one greeting per name
You want them to run independently, so each greeting produces its own XCom result
The downstream print_greetings() expects lists of greetings (one list per language)

If you don’t use expand(), you lose all of that.

🧩 What the DAG expects downstream

Later, Airflow will automatically gather mapped results into lists:

If you used generate_english_greeting(name=names) without .expand, then:

The function would output a single greeting string for the whole list
print_greetings() would be passed an incorrect value
The task UI would show only one greeting, not one per name
Branching (lang_select >> ...) wouldn't behave as intended

⭐ Summary

Usage	Meaning	Result
`generate_english_greeting(name=names)`	treat list as single argument	1 task instance
`generate_english_greeting.expand(name=names)`	map task over list	multiple task instances, one per name

You need .expand() here to produce dynamic fan-out and get one greeting per name.

But according to the definition of function generate_english_greeting(name: str), name is a string, not a list.

You're absolutely right — generate_english_greeting(name: str) declares name as a string, so it is supposed to receive one name at a time, not a list.

And that is exactly why you must use .expand() instead of calling the function normally.

Let’s break it down very cleanly.

✅ Key Concept

**If you call a TaskFlow task normally, you are not calling the Python function.

You are calling its Airflow wrapper**, which accepts anything as arguments (string, list, dict, XCom, etc.) because Airflow doesn't enforce type hints.

So:

would not cause a Python type error.

Airflow would simply store:

and pass this list to the function at runtime.
Then Python would run:

which becomes:

Not what we want.

❌ Why the type hint does NOT help you here

The definition:

is a Python type hint.
Airflow does not enforce type hints for tasks.

Airflow tasks are run like:

So the type hint is for human readers and IDEs, not Airflow.

⭐ What `.expand()` does differently

When you use:

Airflow does this transformation at DAG parse time:

So expand produces values that match the function's declared type
while a normal call would not.

🧩 Put simply

If you don’t use expand:

The task receives name=['Linda', 'Martha', 'Thomas']
Your type hints don't protect you
Your function runs once
Greetings are wrong
Mapping doesn't happen

If you use expand:

Each task receives a proper string name
You get per-name greetings
You get parallelism
You get correct XCom output lists

🔥 Final Summary

Call	What it does	What `name` becomes
`generate_english_greeting(name=names)`	1 task instance, wrong behavior	a list
`generate_english_greeting.expand(name=names)`	multiple mapped tasks, correct behavior

posted on 2025-12-10 16:54 ZhangZhihuiAAA 阅读(1) 评论(0) 收藏举报

刷新页面返回顶部


博客园 © 2004-2025 浙公网安备 33010602011771号浙ICP备2021040463号-3

导航

✅ What expand() Does