rank@ZZHUBT:~$ pip install airflowctl
rank@ZZHUBT:~$ airflowctl init my_airflow_project --build-start ...... ebserver | [2025-01-18 20:46:08 +0800] [19140] [INFO] Starting gunicorn 23.0.0 webserver | [2025-01-18 20:46:08 +0800] [19140] [INFO] Listening at: http://0.0.0.0:8080 (19140) webserver | [2025-01-18 20:46:08 +0800] [19140] [INFO] Using worker: sync webserver | [2025-01-18 20:46:08 +0800] [19152] [INFO] Booting worker with pid: 19152 webserver | [2025-01-18 20:46:08 +0800] [19153] [INFO] Booting worker with pid: 19153 standalone | Airflow is ready standalone | Login with username: admin password: aszEnNbvzC8dPmV4 ......





import json from airflow.decorators import ( dag, task, ) from pendulum import datetime # When using the DAG decorator, The "dag_id" value defaults to the name of the function # it is decorating if not explicitly set. In this example, the "dag_id" value would be "example_dag_basic". @dag( # This defines how often your DAG will run, or the schedule by which your DAG runs. In this case, this DAG # will run daily schedule="@daily", # This DAG is set to run for the first time on January 1, 2023. Best practice is to use a static # start_date. Subsequent DAG runs are instantiated based on the schedule start_date=datetime(2023, 1, 1), # When catchup=False, your DAG will only run the latest run that would have been scheduled. In this case, this means # that tasks will not be run between January 1, 2023 and 30 mins ago. When turned on, this DAG's first # run will be for the next 30 mins, per the its schedule catchup=False, default_args={ "retries": 2, # If a task fails, it will retry 2 times. }, tags=["example"], ) # If set, this tag is shown in the DAG view of the Airflow UI def example_dag_basic(): """ ### Basic ETL Dag This is a simple ETL data pipeline example that demonstrates the use of the TaskFlow API using three simple tasks for extract, transform, and load. For more information on Airflow's TaskFlow API, reference documentation here: https://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html """ @task() def extract(): """ #### Extract task A simple "extract" task to get data ready for the rest of the pipeline. In this case, getting data is simulated by reading from a hardcoded JSON string. """ data_string = '{"1001": 301.27, "1002": 433.21, "1003": 502.22}' order_data_dict = json.loads(data_string) return order_data_dict @task(multiple_outputs=True) # multiple_outputs=True unrolls dictionaries into separate XCom values def transform(order_data_dict: dict): """ #### Transform task A simple "transform" task which takes in the collection of order data and computes the total order value. """ total_order_value = 0 for value in order_data_dict.values(): total_order_value += value return {"total_order_value": total_order_value} @task() def load(total_order_value: float): """ #### Load task A simple "load" task that takes in the result of the "transform" task and prints it out, instead of saving it to end user review """ print(f"Total order value is: {total_order_value:.2f}") order_data = extract() order_summary = transform(order_data) load(order_summary["total_order_value"]) example_dag_basic()
Let’s declare the DAG and identify some of the fields that are required, such as schedule:





















import json from airflow.decorators import ( dag, task, task_group ) from airflow.operators.empty import EmptyOperator from pendulum import datetime @dag( schedule="@daily", start_date=datetime(2023, 1, 1), catchup=False, default_args={ "retries": 2, }, tags=["zzh"], ) def zzh_dag(): @task() def extract(): data_string = '{"1001": 301.27, "1002": 433.21, "1003": 502.22}' order_data_dict = json.loads(data_string) return order_data_dict @task(multiple_outputs=True) # multiple_outputs=True unrolls dictionaries into separate XCom values def transform(order_data_dict: dict): total_order_value = 0 for value in order_data_dict.values(): total_order_value += value return {"total_order_value": total_order_value} @task() def load(total_order_value: float): print(f"Total order value is: {total_order_value:.2f}") @task_group(group_id='extraction_task_group') def tg1(): t1 = extract() t2 = EmptyOperator(task_id='task_2') t1 >> t2 return t1 order_data = tg1() order_summary = transform(order_data) order_data >> order_summary >> load(order_summary["total_order_value"]) zzh_dag()



 
                     
                    
                 
                    
                
 
         
                
            
        
 浙公网安备 33010602011771号
浙公网安备 33010602011771号