ZhangZhihui's Blog  

In Airflow, AssetAlias is a way to define a logical alias (group name) for one or more assets, so that producers and consumers can be loosely coupled without hard-coding exact asset URIs.

Think of it as a label or tag for assets that can be resolved dynamically at runtime.


Why AssetAlias exists

Assets normally require an exact identity match:

Asset("s3://bucket/orders")

That’s very strict. AssetAlias lets you say:

“I don’t care which exact asset(s) — just give me whatever assets are known as orders.”

This is especially useful when:

  • Asset URIs change

  • Multiple producers exist

  • You want reusable / generic consumer DAGs

  • You want late binding of assets


What AssetAlias does (conceptually)

AssetAlias resolves to the set of assets currently registered under that alias.

It is not a new asset itself.
It does not store data.
It does not create copies.

It’s a dynamic reference.


Basic example

Producer side

from airflow.sdk import Asset, AssetAlias, asset

orders_alias = AssetAlias("orders")

@asset(
    uri="s3://bucket/orders/2025/",
    aliases=[orders_alias],
)
def produce_orders():
    pass

This means:

  • The asset s3://bucket/orders/2025/

  • Is also known by alias "orders"


Consumer side

@dag(schedule=AssetAlias("orders"))
def consume_orders():
    @task
    def process():
        print("Orders updated!")

This DAG runs:

  • When any asset with alias "orders" updates

No URI coupling required.


Multiple assets → one alias

@asset(uri="s3://bucket/orders/us/", aliases=[AssetAlias("orders")])
def orders_us():
    pass

@asset(uri="s3://bucket/orders/eu/", aliases=[AssetAlias("orders")])
def orders_eu():
    pass

Consumer DAG:

@dag(schedule=AssetAlias("orders"))
def reporting():
    ...

Result:

  • Reporting DAG runs when either US or EU orders update


One asset → multiple aliases

@asset(
    uri="warehouse.orders",
    aliases=[
        AssetAlias("orders"),
        AssetAlias("finance_inputs"),
    ],
)
def orders():
    pass

AssetAlias with AND / OR

schedule = AssetAlias("orders") & AssetAlias("customers")

Meaning:

Run when any asset tagged orders
AND
any asset tagged customers
have both been updated.


How it appears in the UI

In the Assets view:

  • Assets show their aliases

  • Aliases show which assets belong to them

  • Lineage graph groups assets via aliases

This gives a semantic layer over raw URIs.


Important constraints & behavior

1️⃣ Aliases are global

  • Same alias name = same logical group

  • No namespace by DAG

2️⃣ Aliases resolve at runtime

  • Adding a new asset with alias "orders" automatically affects consumers

  • No DAG code changes required

3️⃣ Aliases must exist

  • If no asset has that alias, the consumer DAG will not run


Common use cases

✅ Stable interfaces between teams
✅ Schema / domain-based grouping
✅ Environment abstraction (orders_dev, orders_prod)
✅ Multi-region data pipelines
✅ Gradual migrations


When NOT to use AssetAlias

❌ When you need a specific dataset
❌ When asset identity must be precise
❌ For passing data between tasks


Mental model

AssetAlias("orders")
       │
       ├── s3://bucket/orders/us/
       ├── s3://bucket/orders/eu/
       └── warehouse.orders

Consumers depend on the concept, not the location.


One-sentence summary

AssetAlias lets you depend on a semantic group of assets instead of hard-coded asset URIs, enabling loose coupling and flexible, data-driven DAGs.

 

posted on 2025-12-15 17:48  ZhangZhihuiAAA  阅读(2)  评论(0)    收藏  举报