Loading...

  • 17 Apr, 2026

Introduction to Apache Airflow DAGs

Introduction to Apache Airflow DAGs

A Directed Acyclic Graph (DAG) in Apache Airflow represents a workflow or pipeline where tasks are executed in a defined order based on dependencies. Each DAG consists of multiple tasks that can run in parallel or sequentially.

1. Introduction to Apache Airflow DAGs

What is a DAG?

A Directed Acyclic Graph (DAG) in Apache Airflow represents a workflow or pipeline where tasks are executed in a defined order based on dependencies. Each DAG consists of multiple tasks that can run in parallel or sequentially.

  • Directed: Tasks are executed in a specific order.
  • Acyclic: The graph cannot contain cycles (no task can depend on itself).
  • Graph: Represents a network of interconnected tasks.

2. Apache Airflow DAG Architecture

Components of a DAG

  1. DAG Definition: Written in Python and defines the workflow.
  2. Operators: Represent different types of tasks (e.g., BashOperator, PythonOperator, DummyOperator).
  3. Tasks: Individual units of work within a DAG.
  4. Task Dependencies: Define execution order and relationships between tasks.

Airflow Architecture Overview

Apache Airflow consists of the following components:

  • Scheduler: Determines when tasks should run.
  • Executor: Executes tasks (LocalExecutor, CeleryExecutor, KubernetesExecutor, etc.).
  • Worker Nodes: Execute tasks in a distributed system (for Celery/Kubernetes executors).
  • Metadata Database: Stores DAGs, task statuses, logs, and execution metadata.
  • Web UI: Provides a graphical interface for monitoring DAGs and tasks.

3. Structure of an Apache Airflow DAG

Basic Components of a DAG File

A DAG file is a Python script defining workflow structure. It includes:

  • Imports: Required Airflow modules and libraries.
  • DAG Object: Defines the DAG’s properties (e.g., start date, schedule interval).
  • Tasks: Defined using operators (e.g., PythonOperator, BashOperator).
  • Task Dependencies: Defines execution order using >> (sequential) or [t1, t2] >> t3 (parallel execution).

4. Example Apache Airflow DAG

Creating a Simple DAG in Apache Airflow

Create a new DAG file inside the dags/ directory in your Airflow project:

from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.bash import BashOperator

# Define default arguments
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2024, 1, 1),
    'retries': 1,
    'retry_delay': timedelta(minutes=5),
}

# Define the DAG
dag = DAG(
    'example_dag',
    default_args=default_args,
    description='A simple example DAG',
    schedule_interval=timedelta(days=1),
)

# Define tasks
task1 = BashOperator(
    task_id='print_hello',
    bash_command='echo "Hello, Airflow!"',
    dag=dag,
)

task2 = BashOperator(
    task_id='print_goodbye',
    bash_command='echo "Goodbye, Airflow!"',
    dag=dag,
)

# Define dependencies
task1 >> task2  # task1 runs before task2

Explanation of Code:

  • The DAG starts on January 1, 2024.
  • The schedule_interval runs the DAG daily.
  • task1 prints "Hello, Airflow!".
  • task2 prints "Goodbye, Airflow!".
  • Dependency: task1 >> task2 ensures task1 runs before task2 .

5. How to Deploy and Run a DAG

  1. Save the DAG file: Place it in the dags/ directory inside Airflow.
  2. Start Apache Airflow:

    airflow standalone
  3. Check the Web UI: Open http://localhost:8080 and navigate to the DAGs page.
  4. Trigger the DAG: Click the "Trigger DAG" button.
  5. Monitor Execution: Check logs in the UI or run:

    airflow tasks logs example_dag print_hello

6. Conclusion

Apache Airflow DAGs are powerful for orchestrating workflows efficiently. This guide covered:

  • What a DAG is
  • DAG structure and architecture
  • A step-by-step example DAG
  • How to deploy and run a DAG

With this foundation, you can start building complex workflows in Apache Airflow! 🚀

 

John Smith

How puzzling all these changes are! I'm never sure what I'm going to turn into a tidy little room.