公告

日历

from __future__ import annotations

import pendulum

from airflow.sdk import Asset, asset, dag, task


@asset(uri='s3://bucket/asset1_producer', schedule=None)
def asset1_producer():
    pass


@asset(uri='s3://bucket/object', schedule=None)
def asset2_producer(self, context, asset1_producer):
    print(self)
    print(context['inlet_events'][asset1_producer])


@dag(
    schedule=Asset(uri='s3://bucket/asset1_producer', name='asset1_producer') | 
             Asset(uri='s3://bucket/object', name='asset2_producer'),
    start_date=pendulum.datetime(2021, 1, 1, tz='UTC'),
    catchup=False,
    tags=['consumes', 'asset-scheduled']
)
def consumes_asset_decorator():
    @task(outlets=[Asset(name='process_nothing')])
    def process_nothing():
        pass

    process_nothing()


consumes_asset_decorator()

What does the parameter outlets do?

In Airflow, the outlets parameter tells Airflow which assets a task produces (updates) when it successfully finishes.

Let’s break it down clearly using your example.

What `outlets` means

outlets declares the assets that this task emits (updates) on success.

In plain terms:

When process_nothing succeeds
Airflow marks the asset process_nothing as updated
Any DAGs that depend on this asset become eligible to run

Why it’s called outlets

Airflow uses data-flow terminology:

Term	Meaning
inlets	Assets a task consumes
outlets	Assets a task produces

So:

Data flows into tasks via inlets
Data flows out of tasks via outlets

Important clarifications

1️⃣ `outlets` does NOT move data

Your task:

Still:

Writes no data
Returns nothing
Touches no storage

Yet the asset is marked as updated.

📌 Assets are signals, not data.

2️⃣ Assets update only on success

If the task fails → asset is not updated
If the task is skipped → asset is not updated
Success is the trigger

3️⃣ Assets are global, not DAG-scoped

The asset name:

Is globally visible across Airflow
Can be consumed by any DAG
Creates cross-DAG dependencies

How this is used downstream

Consumer DAG example

This DAG runs:

When process_nothing succeeds
Even if it’s in a completely different DAG

What shows up in the UI

In the Assets view:

Asset: process_nothing
Producer: process_nothing task
Consumers: any DAG scheduled on it
Last updated timestamp

When `outlets` is useful

✅ Marking data availability
✅ Event-driven pipelines
✅ Decoupling producer/consumer DAGs
✅ Lineage & observability

When it’s misleading

⚠️ If the task doesn’t actually produce data
⚠️ If asset names don’t reflect real data

Best practice:

Only declare an outlet when the task truly updates the represented data.

One-line summary

outlets tells Airflow “this task updates these assets when it succeeds,” enabling data-driven scheduling and lineage.

How many DAGs will the above code generate? What are their dag_ids?

This code will generate 3 DAGs.

Let’s go through them one by one and explain why, because the @asset decorator behaves differently from @dag.

1️⃣ DAG from `@asset` → `asset1_producer`

What this does

@asset creates an asset-producing DAG
Even though it looks like a function, it is actually turned into:
- a DAG
- with a single task
- that updates the asset s3://bucket/asset1_producer

DAG ID

2️⃣ DAG from `@asset` → `asset2_producer`

What this does

This is another asset-defined DAG
It:
- Produces asset s3://bucket/object
- Consumes asset asset1_producer (via the function argument)
The parameters self, context, and asset1_producer are injected by Airflow’s asset system

DAG ID

📌 Even though it consumes asset1_producer, it is still its own DAG.

3️⃣ DAG from `@dag` → `consumes_asset_decorator`

What this does

This is a regular DAG
It is asset-scheduled
It runs when either:
- s3://bucket/asset1_producer updates
- OR s3://bucket/object updates

(The | operator means OR for assets.)

DAG ID

Final answer

✅ Number of DAGs generated

3 DAGs

✅ DAG IDs

Source	DAG ID
`@asset` decorator	`asset1_producer`
`@asset` decorator	`asset2_producer`
`@dag` decorator	`consumes_asset_decorator`

Mental model (very important)

@asset always creates a DAG
Each @asset function = one DAG
@dag creates a DAG as usual
Assets connect DAGs by data, not by code structure

What this looks like in the UI

You will see:

3 DAGs in the DAG list

An asset lineage graph:

If you want, I can also explain:

how the injected context["inlet_events"] works

Yes, I want.

Great — let’s go through all three topics clearly and concretely, using your exact code as the anchor.

`context["inlet_events"]`: what it is and how it works

What problem it solves

When a DAG or asset runs because an asset was updated, you often want to know:

Which asset triggered this run?
When was it updated?
Which task produced it?

That information is exposed via inlet events.

What `inlet_events` contains

Inside an asset-decorated function:

is a dictionary:

Each AssetEvent represents one update of that asset.

Your example

Here:

asset1_producer (the function argument) is the Asset object
Airflow injects it because this asset depends on it
context["inlet_events"][asset1_producer] returns a list of events

Example content (simplified):

Why it’s a list

Multiple upstream asset updates can trigger a run
Backfills can create multiple events
The list preserves full lineage

When this is useful

✅ Incremental processing
✅ Auditing & observability
✅ Data freshness logic
✅ Debugging asset-triggered runs

Operator	Meaning	Typical use
`	`	Any asset
`&`	All assets	Joins, reports, aggregates

posted on 2025-12-15 15:56 ZhangZhihuiAAA 阅读(2) 评论(0) 收藏举报

刷新页面返回顶部


博客园 © 2004-2025 浙公网安备 33010602011771号浙ICP备2021040463号-3

导航

High-level definition

Why assets exist (problem they solve)

What an asset looks like

Defining an asset

Producing an asset

Consuming an asset

Key concept: assets ≠ storage

Asset vs Task dependency

Task dependency (old model)

Asset dependency (new model)

Example end-to-end

Producer DAG

Consumer DAG

Assets in the Airflow UI

When you should use assets

When you should NOT use assets

Mental model

One-sentence summary

What outlets means

Why it’s called outlets

Important clarifications

1️⃣ outlets does NOT move data

2️⃣ Assets update only on success

3️⃣ Assets are global, not DAG-scoped

How this is used downstream

Consumer DAG example

What shows up in the UI

When outlets is useful

When it’s misleading

One-line summary

1️⃣ DAG from @asset → asset1_producer

What this does

DAG ID

2️⃣ DAG from @asset → asset2_producer

What this does

DAG ID

3️⃣ DAG from @dag → consumes_asset_decorator

What this does

DAG ID

Final answer

✅ Number of DAGs generated

✅ DAG IDs

Mental model (very important)

What this looks like in the UI

context["inlet_events"]: what it is and how it works

What problem it solves

What inlet_events contains

Your example

Why it’s a list

When this is useful

What `outlets` means

1️⃣ `outlets` does NOT move data

When `outlets` is useful

1️⃣ DAG from `@asset` → `asset1_producer`

2️⃣ DAG from `@asset` → `asset2_producer`

3️⃣ DAG from `@dag` → `consumes_asset_decorator`

`context["inlet_events"]`: what it is and how it works

What `inlet_events` contains