ZhangZhihui's Blog  

 

#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
"""Example DAG demonstrating the usage DAG params to model a trigger UI with a user form.

This example DAG generates greetings to a list of provided names in selected languages in the logs.
"""

from __future__ import annotations

import datetime
from pathlib import Path

from airflow.sdk import DAG, Param, TriggerRule, task

# [START params_trigger]
with DAG(
    dag_id=Path(__file__).stem,
    dag_display_name="Params Trigger UI",
    description=__doc__.partition(".")[0],
    doc_md=__doc__,
    schedule=None,
    start_date=datetime.datetime(2022, 3, 4),
    catchup=False,
    tags=["example", "params"],
    params={
        "names": Param(
            ["Linda", "Martha", "Thomas"],
            type="array",
            description="Define the list of names for which greetings should be generated in the logs."
            " Please have one name per line.",
            title="Names to greet",
        ),
        "english": Param(True, type="boolean", title="English"),
        "german": Param(True, type="boolean", title="German (Formal)"),
        "french": Param(True, type="boolean", title="French"),
    },
) as dag:

    @task(task_id="get_names", task_display_name="Get names")
    def get_names(**kwargs) -> list[str]:
        params = kwargs["params"]
        if "names" not in params:
            print("Uuups, no names given, was no UI used to trigger?")
            return []
        return params["names"]

    @task.branch(task_id="select_languages", task_display_name="Select languages")
    def select_languages(**kwargs) -> list[str]:
        params = kwargs["params"]
        selected_languages = []
        for lang in ["english", "german", "french"]:
            if params[lang]:
                selected_languages.append(f"generate_{lang}_greeting")
        return selected_languages

    @task(task_id="generate_english_greeting", task_display_name="Generate English greeting")
    def generate_english_greeting(name: str) -> str:
        return f"Hello {name}!"

    @task(task_id="generate_german_greeting", task_display_name="Erzeuge Deutsche Begrüßung")
    def generate_german_greeting(name: str) -> str:
        return f"Sehr geehrter Herr/Frau {name}."

    @task(task_id="generate_french_greeting", task_display_name="Produire un message d'accueil en français")
    def generate_french_greeting(name: str) -> str:
        return f"Bonjour {name}!"

    @task(task_id="print_greetings", task_display_name="Print greetings", trigger_rule=TriggerRule.ALL_DONE)
    def print_greetings(greetings1, greetings2, greetings3) -> None:
        for g in greetings1 or []:
            print(g)
        for g in greetings2 or []:
            print(g)
        for g in greetings3 or []:
            print(g)
        if not (greetings1 or greetings2 or greetings3):
            print("sad, nobody to greet :-(")

    lang_select = select_languages()
    names = get_names()
    english_greetings = generate_english_greeting.expand(name=names)
    german_greetings = generate_german_greeting.expand(name=names)
    french_greetings = generate_french_greeting.expand(name=names)
    lang_select >> [english_greetings, german_greetings, french_greetings]
    results_print = print_greetings(english_greetings, german_greetings, french_greetings)
# [END params_trigger]

 

1

一个task作为参数传入另一个task,则会自动成为其依赖。

 

In Apache Airflow, expand() is part of the TaskFlow API’s dynamic task mapping feature.
It is not a method on a Task instance (TaskInstance), but rather a method on a TaskFlow-decorated task function or operator object used before the task is scheduled.

✅ What expand() Does

expand() allows you to dynamically create multiple task instances at runtime from a single task definition.

Instead of writing:

task_a = process(value=1)
task_b = process(value=2)
task_c = process(value=3)

You can write:

process.expand(value=[1, 2, 3])

Airflow will create three mapped task instances, one per item.

Why it exists

Dynamic task mapping lets you create tasks based on data that may not be known until the DAG run starts—e.g., list of files, API responses, partitions, etc.


🔍 How expand() Works

1. You write a task using the TaskFlow API

@task
def process(value):
    print(value)

2. You call expand() at DAG parse time

process.expand(value=[10, 20, 30])

3. At runtime, Airflow generates task instances:

  • process[0] → value = 10

  • process[1] → value = 20

  • process[2] → value = 30

These appear in the Airflow UI as mapped tasks.


🧩 How expand() differs from partial()

partial()

Sets fixed arguments:

download = fetch.partial(path="/data")

expand()

Provides arguments that vary per mapped task:

download.expand(filename=["a.csv", "b.csv", "c.csv"])

You can combine them:

fetch.partial(path="/data").expand(filename=["a.csv", "b.csv"])

🔧 Example With partial() + expand()

You often want some arguments fixed and others dynamic:

@task
def upload_to_s3(bucket, filename):
    print(f"Uploading {filename} → {bucket}")

upload_to_s3.partial(bucket="my-data").expand(filename=file_list)

This produces:

  • upload_to_s3[0] → bucket="my-data", filename="a.csv"

  • upload_to_s3[1] → bucket="my-data", filename="b.csv"

  • upload_to_s3[2] → bucket="my-data", filename="c.csv"

 

partial() and expand() are Airflow-provided methods, not built-in Python functions.

Be careful not to confuse:

  • Airflow’s partial() (for fixing operator args)

  • Python’s functools.partial() (for currying any callable)

Airflow intentionally uses the same name because the idea (pre-filling fixed arguments) is similar, but the implementation is completely different.

 

In the above code snippet, why not use generate_english_greeting(name=names) directly instead of generate_english_greeting.expand(name=names) ?
 
ChatGPT said:

Great question — and it's a core concept of Airflow’s TaskFlow dynamic task mapping.


Short Answer

Because:

generate_english_greeting(name=names)

would pass the entire list of names as one argument, producing one task instance.

generate_english_greeting.expand(name=names)

creates one mapped task instance per name, producing multiple parallel tasks, one per item in the list.


Detailed Explanation

Suppose names is:

["Linda", "Martha", "Thomas"]

❌ If you write:

generate_english_greeting(name=names)

Then Airflow treats names as a single list argument, and the function receives:

["Linda", "Martha", "Thomas"]

So inside the task, name would be that entire list.

You would get 1 task instance:

generate_english_greeting

And it prints something like:

Hello ['Linda', 'Martha', 'Thomas']!

Not what you want.


✅ When you write:

generate_english_greeting.expand(name=names)

Airflow dynamically maps the task and produces 3 separate task instances:

generate_english_greeting[0]  → name="Linda"
generate_english_greeting[1]  → name="Martha"
generate_english_greeting[2]  → name="Thomas"

Each one runs separately, often in parallel.

Each task prints a greeting for a single name.


🧠 Why dynamic task mapping matters here

Dynamic task mapping is used because:

  • The user can provide any number of names from the UI form

  • You want one greeting per name

  • You want them to run independently, so each greeting produces its own XCom result

  • The downstream print_greetings() expects lists of greetings (one list per language)

If you don’t use expand(), you lose all of that.


🧩 What the DAG expects downstream

Later, Airflow will automatically gather mapped results into lists:

english_greetings → ["Hello Linda!", "Hello Martha!", "Hello Thomas!"]

If you used generate_english_greeting(name=names) without .expand, then:

  • The function would output a single greeting string for the whole list

  • print_greetings() would be passed an incorrect value

  • The task UI would show only one greeting, not one per name

  • Branching (lang_select >> ...) wouldn't behave as intended


⭐ Summary

UsageMeaningResult
generate_english_greeting(name=names) treat list as single argument 1 task instance
generate_english_greeting.expand(name=names) map task over list multiple task instances, one per name

You need .expand() here to produce dynamic fan-out and get one greeting per name.

 

But according to the definition of function generate_english_greeting(name: str), name is a string, not a list.
 
ChatGPT said:

You're absolutely right — generate_english_greeting(name: str) declares name as a string, so it is supposed to receive one name at a time, not a list.

And that is exactly why you must use .expand() instead of calling the function normally.

Let’s break it down very cleanly.


Key Concept

**If you call a TaskFlow task normally, you are not calling the Python function.

You are calling its Airflow wrapper**, which accepts anything as arguments (string, list, dict, XCom, etc.) because Airflow doesn't enforce type hints.

So:

generate_english_greeting(name=names)

would not cause a Python type error.

Airflow would simply store:

name = ["Linda", "Martha", "Thomas"]

and pass this list to the function at runtime.
Then Python would run:

return f"Hello {name}!"

which becomes:

"Hello ['Linda', 'Martha', 'Thomas']!"

Not what we want.


❌ Why the type hint does NOT help you here

The definition:

def generate_english_greeting(name: str)

is a Python type hint.
Airflow does not enforce type hints for tasks.

Airflow tasks are run like:

def execute(self, context):
    result = python_callable(**mapped_keyword_arguments)

It does not check whether the argument types match the hint.

So the type hint is for human readers and IDEs, not Airflow.


⭐ What .expand() does differently

When you use:

generate_english_greeting.expand(name=names)

Airflow does this transformation at DAG parse time:

names = ["Linda", "Martha", "Thomas"]

Task becomes mapped → produces 3 task instances:

generate_english_greeting[0] → name="Linda"
generate_english_greeting[1] → name="Martha"
generate_english_greeting[2] → name="Thomas"

Each matches the actual expected function signature:

name: str

So expand produces values that match the function's declared type
while a normal call would not.


🧩 Put simply

If you don’t use expand:

  • The task receives name=['Linda', 'Martha', 'Thomas']

  • Your type hints don't protect you

  • Your function runs once

  • Greetings are wrong

  • Mapping doesn't happen

If you use expand:

  • Each task receives a proper string name

  • You get per-name greetings

  • You get parallelism

  • You get correct XCom output lists


🔥 Final Summary

CallWhat it doesWhat name becomes
generate_english_greeting(name=names) 1 task instance, wrong behavior a list
generate_english_greeting.expand(name=names) multiple mapped tasks, correct behavior

 

posted on 2025-12-10 16:54  ZhangZhihuiAAA  阅读(1)  评论(0)    收藏  举报