In Apache Airflow, an asset is a logical representation of a piece of data that your workflows produce or consume, independent of where that data lives or how it’s computed.
Assets are part of Airflow’s data-aware scheduling model (introduced in Airflow 2.4 and expanded since).
High-level definition
An Airflow asset represents the existence or freshness of data, not the task that produced it.
An asset answers:
-
“This dataset now exists or has been updated”
-
not “This task finished”
Why assets exist (problem they solve)
Traditional Airflow scheduling is time-based and task-centric:
But real data pipelines care about data availability, not just task order:
-
“Run this DAG when the table is updated”
-
“Trigger downstream jobs when a file arrives”
-
“React to data, not clocks”
Assets enable this.
What an asset looks like
Defining an asset
(or, in newer Airflow versions:)
The URI is just an identifier, not a connection.
Producing an asset
This means:
“When this task succeeds, the
orders_assetis updated.”
Consuming an asset
This DAG:
-
Has no cron schedule
-
Runs when the asset is updated
Key concept: assets ≠ storage
An asset:
-
❌ Does NOT store data
-
❌ Does NOT move data
-
❌ Does NOT validate data
-
✅ Represents data state
Think of it as a signal, not the data itself.
Asset vs Task dependency
Task dependency (old model)
-
Tight coupling
-
Same DAG
-
Time-based thinking
Asset dependency (new model)
-
Loose coupling
-
Cross-DAG
-
Data-based thinking
Example end-to-end
Producer DAG
Consumer DAG
Result:
-
reportingruns wheneverordersupdates -
Even if producer DAG name or structure changes
Assets in the Airflow UI
In the Assets tab you’ll see:
-
Asset name / URI
-
Producing tasks
-
Consuming DAGs
-
Last update time
-
Lineage graph
This gives you data lineage, not just task lineage.
When you should use assets
✅ Decoupling producer and consumer DAGs
✅ Cross-DAG dependencies
✅ Event-driven pipelines
✅ Data freshness-based scheduling
✅ Clear data lineage
When you should NOT use assets
❌ Passing data between tasks
❌ Large or intermediate data exchange
❌ Tight step-by-step workflows within one DAG
(Use normal task dependencies or external storage instead.)
Mental model
Think of assets like “data events”:
-
Task succeeds → asset updated
-
Asset updated → downstream DAGs triggered
One-sentence summary
An Airflow asset is a declarative signal that a specific piece of data has been created or updated, enabling data-driven scheduling and lineage across DAGs.
from __future__ import annotations import pendulum from airflow.sdk import Asset, asset, dag, task @asset(uri='s3://bucket/asset1_producer', schedule=None) def asset1_producer(): pass @asset(uri='s3://bucket/object', schedule=None) def asset2_producer(self, context, asset1_producer): print(self) print(context['inlet_events'][asset1_producer]) @dag( schedule=Asset(uri='s3://bucket/asset1_producer', name='asset1_producer') | Asset(uri='s3://bucket/object', name='asset2_producer'), start_date=pendulum.datetime(2021, 1, 1, tz='UTC'), catchup=False, tags=['consumes', 'asset-scheduled'] ) def consumes_asset_decorator(): @task(outlets=[Asset(name='process_nothing')]) def process_nothing(): pass process_nothing() consumes_asset_decorator()
What does the parameter outlets do?
In Airflow, the outlets parameter tells Airflow which assets a task produces (updates) when it successfully finishes.
Let’s break it down clearly using your example.
What outlets means
outletsdeclares the assets that this task emits (updates) on success.
In plain terms:
-
When
process_nothingsucceeds -
Airflow marks the asset
process_nothingas updated -
Any DAGs that depend on this asset become eligible to run
Why it’s called outlets
Airflow uses data-flow terminology:
| Term | Meaning |
|---|---|
| inlets | Assets a task consumes |
| outlets | Assets a task produces |
So:
-
Data flows into tasks via inlets
-
Data flows out of tasks via outlets
Important clarifications
1️⃣ outlets does NOT move data
Your task:
Still:
-
Writes no data
-
Returns nothing
-
Touches no storage
Yet the asset is marked as updated.
📌 Assets are signals, not data.
2️⃣ Assets update only on success
-
If the task fails → asset is not updated
-
If the task is skipped → asset is not updated
-
Success is the trigger
3️⃣ Assets are global, not DAG-scoped
The asset name:
-
Is globally visible across Airflow
-
Can be consumed by any DAG
-
Creates cross-DAG dependencies
How this is used downstream
Consumer DAG example
This DAG runs:
-
When
process_nothingsucceeds -
Even if it’s in a completely different DAG
What shows up in the UI
In the Assets view:
-
Asset:
process_nothing -
Producer:
process_nothingtask -
Consumers: any DAG scheduled on it
-
Last updated timestamp
When outlets is useful
✅ Marking data availability
✅ Event-driven pipelines
✅ Decoupling producer/consumer DAGs
✅ Lineage & observability
When it’s misleading
⚠️ If the task doesn’t actually produce data
⚠️ If asset names don’t reflect real data
Best practice:
Only declare an outlet when the task truly updates the represented data.
One-line summary
outletstells Airflow “this task updates these assets when it succeeds,” enabling data-driven scheduling and lineage.
How many DAGs will the above code generate? What are their dag_ids?
This code will generate 3 DAGs.
Let’s go through them one by one and explain why, because the @asset decorator behaves differently from @dag.
1️⃣ DAG from @asset → asset1_producer
What this does
-
@assetcreates an asset-producing DAG -
Even though it looks like a function, it is actually turned into:
-
a DAG
-
with a single task
-
that updates the asset
s3://bucket/asset1_producer
-
DAG ID
2️⃣ DAG from @asset → asset2_producer
What this does
-
This is another asset-defined DAG
-
It:
-
Produces asset
s3://bucket/object -
Consumes asset
asset1_producer(via the function argument)
-
-
The parameters
self,context, andasset1_producerare injected by Airflow’s asset system
DAG ID
📌 Even though it consumes asset1_producer, it is still its own DAG.
3️⃣ DAG from @dag → consumes_asset_decorator
What this does
-
This is a regular DAG
-
It is asset-scheduled
-
It runs when either:
-
s3://bucket/asset1_producerupdates -
OR
s3://bucket/objectupdates
-
(The | operator means OR for assets.)
DAG ID
Final answer
✅ Number of DAGs generated
3 DAGs
✅ DAG IDs
| Source | DAG ID |
|---|---|
@asset decorator |
asset1_producer |
@asset decorator |
asset2_producer |
@dag decorator |
consumes_asset_decorator |
Mental model (very important)
-
@assetalways creates a DAG -
Each
@assetfunction = one DAG -
@dagcreates a DAG as usual -
Assets connect DAGs by data, not by code structure
What this looks like in the UI
You will see:
-
3 DAGs in the DAG list
-
An asset lineage graph:

浙公网安备 33010602011771号