ZhangZhihui's Blog  

🌬️ What Is Jinja Templating in Airflow?

Jinja templating in Airflow lets you write dynamic, runtime-rendered text inside operators, SQL files, scripts, and parameters.

Airflow automatically replaces template variables—like {{ ds }} or {{ ti.xcom_pull() }}—before executing the task.

This allows tasks to adapt to the execution date, runtime environment, upstream results, or macros—no hardcoding required.


🧠 Why Airflow Uses Jinja

Airflow DAGs often operate on schedules:

  • Daily ETL with date parameters

  • Dynamic file paths

  • SQL queries that depend on execution date

  • Passing XCom values into SQL/paths

  • Batch windows

  • Conditional branching based on macros

Jinja templates allow values to change for every run, without modifying the Python code.


🔧 What Can Be Templated?

Airflow lets you template:

✔ Inside Operators (most commonly)

  • bash_command (BashOperator)

  • sql (PostgresOperator, SnowflakeOperator, BigQueryOperator)

  • python_callable arguments via op_kwargs / op_args

  • templates_dict (PythonOperator)

  • File locations (if their extensions match template_ext)

✔ Entire SQL or script files

Airflow can load external .sql, .txt, .json, .py files and render them as Jinja templates.

✔ Custom operators

Any attribute listed in an Operator's template_fields will be templated automatically.


🧬 How Airflow Renders Templates

Rendering occurs just before the task is executed, following this workflow:

  1. Airflow scans the operator’s template_fields.

  2. For each field:

    • If it's a string → render as Jinja template

    • If it looks like a file path and matches template_ext → load file → render contents

  3. Airflow injects:

    • Context variables (execution date, logical date, task instance)

    • Macros (functions such as ds_add, macros.datetime etc.)

    • User-defined macros

  4. The operator receives rendered strings, not templates.


🧩 Built-in Variables (Context)

Airflow injects many variables into each Jinja template.

🕒 Execution date values

VariableMeaning
ds Run date in YYYY-MM-DD
ds_nodash YYYYMMDD
ts run datetime ISO format
logical_date Pendulum datetime object
data_interval_start start of data interval
data_interval_end end of data interval

🏷️ Task-related

VariableDescription
task Task object
ti TaskInstance (pull XComs here)
task_instance_key_str unique key for run

🗃️ DAG-related

VariableDescription
dag DAG object
dag_run DagRun object
macros Collection of built-in utilities

🪜 Macros

Airflow provides many helpful macros:

Date macros

  • {{ ds }} — execution date

  • {{ ds_add(ds, 1) }} — add days

  • {{ ds_format(ds, "%Y-%m-%d", "%d/%m/%Y") }}

Datetime macros

  • {{ macros.datetime.utcnow() }}

  • {{ macros.timedelta(days=7) }}

XCom macros

  • {{ ti.xcom_pull(task_ids='task1') }}

JSON

  • {{ macros.json.dumps(my_dict) }}


🧪 Examples of Jinja Templating

1️⃣ SQL Templating Example

sample.sql

SELECT * FROM sales
WHERE sale_date BETWEEN '{{ ds }}' AND '{{ macros.ds_add(ds, 1) }}'

Rendered at runtime:

SELECT * FROM sales
WHERE sale_date BETWEEN '2025-01-10' AND '2025-01-11'

 


2️⃣ Templating in BashOperator

BashOperator(
    task_id="echo_dates",
    bash_command="""
        echo "Today is {{ ds }}";
        echo "Interval start: {{ data_interval_start }}";
        echo "Interval end:   {{ data_interval_end }}";
    """
)

 


3️⃣ Templating in PythonOperator (via templates_dict)

PythonOperator(
    task_id='do_sql',
    python_callable=run_sql,
    templates_dict={'sql': 'queries/my_query.sql'},
    templates_exts=['.sql'],
)

Inside run_sql:

def run_sql(**context):
    query = context['templates_dict']['sql']  # This is already rendered!
    print(query)

 


4️⃣ Templating Dynamic File Paths

BashOperator(
    task_id="load_file",
    bash_command="cat /data/input/file_{{ ds_nodash }}.csv"
)

 


5️⃣ Using XCom inside templates

INSERT INTO metrics (value)
VALUES ({{ ti.xcom_pull(task_ids='calculate') }});

 


📎 Template Fields and Template Exts

template_fields

Operators declare which fields should be templated.

For example, BashOperator:

template_fields = ("bash_command",)

PythonOperator:

template_fields = ("op_args", "op_kwargs", "templates_dict")

Custom Operators can define their own.


template_ext (for file templating)

Extensions Airflow treats as template files.

Common:

template_ext = ['.sql', '.json', '.yml']

In PythonOperator, you can override it with:

templates_exts=['.sql']

 


🛠️ Where Templates Are Stored

Common project structure:

dags/
  my_dag.py
  sql/
    extract_data.sql
    load_data.sql
  templates/
    email_template.html

You can reference them by path inside templates.


🌟 Best Practices for Airflow Templating

✔ DO store SQL queries in separate files

Keeps DAG clean and readable.

✔ DO use macros for dates

Avoid manual date arithmetic.

✔ DO keep templates side-effect-free

Templates should produce values, not execute logic.

✔ DO log rendered templates for debugging

Example (Airflow automatically writes to /log/.../rendered_fields).

❌ DON’T mix heavy business logic inside Jinja

Use Python code for real logic; templates should be simple.

❌ DON’T hardcode dates

Always parameterize.


🎯 Summary

Airflow’s Jinja templating provides:

  • Dynamic SQL generation

  • Date-based partitioning

  • Runtime-dependent file paths

  • Access to task instance, DAG run, macros, XCom

  • Support for templating entire files

  • Templating inside operator parameters

It is one of the main reasons Airflow can orchestrate complex, date-driven ETLs cleanly and elegantly.

 

posted on 2025-12-12 10:40  ZhangZhihuiAAA  阅读(4)  评论(0)    收藏  举报