Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Workflow:Astronomer Astronomer cosmos Dbt docs generation

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, dbt, Airflow, Documentation
Last Updated 2026-02-07 17:00 GMT

Overview

End-to-end process for generating dbt documentation using Cosmos operators and uploading the artifacts to cloud storage (S3, GCS, Azure Blob) for hosting through the Airflow UI dbt docs plugin.

Description

This workflow covers the generation and hosting of dbt documentation using Cosmos's specialized documentation operators. Cosmos provides DbtDocsS3Operator, DbtDocsGCSOperator, and DbtDocsAzureStorageOperator that combine dbt docs generation with cloud storage upload in a single operator. These operators run dbt docs generate to produce the documentation artifacts (manifest.json, catalog.json, index.html), then upload them to the configured cloud storage bucket.

The uploaded documentation can be served through Cosmos's Airflow plugin, which adds a Cosmos dbt Docs menu item to the Airflow UI. The plugin renders the dbt documentation in an iframe, supporting both Airflow 2 (Flask-based) and Airflow 3 (FastAPI-based) interfaces. Storage type is auto-detected from the connection, and the plugin retrieves artifacts from the configured cloud bucket.

Usage

Execute this workflow when you need to generate and publish dbt project documentation for your team. This is typically run on a schedule (e.g., daily or after each dbt run) to keep documentation in sync with the current state of the dbt project. The resulting docs are accessible to all Airflow users through the integrated Airflow UI plugin, providing a centralized location for data model documentation, lineage visualization, and test coverage reporting.

Execution Steps

Step 1: Configure profile for documentation

Create a ProfileConfig with a profile mapping that connects to the target database. The dbt docs generate command needs database access to compile the documentation catalog, which includes column-level metadata, row counts, and table descriptions extracted from the database's information schema.

Key considerations:

  • The profile must have read access to the database information schema
  • Use the same profile configuration as your dbt run tasks for consistency
  • The connection must be valid at runtime when the docs generation runs

Step 2: Set up cloud storage connection

Configure an Airflow connection for the target cloud storage provider (AWS S3, Google Cloud Storage, or Azure Blob Storage). The connection provides authentication credentials for uploading documentation artifacts. The connection ID is passed to the docs operator.

Key considerations:

  • For S3: use an aws_default or custom AWS connection with S3 write permissions
  • For GCS: use a google_cloud_default or custom GCP connection
  • For Azure: use a wasb_default or custom Azure connection
  • The target bucket must exist and the credentials must have write access

Step 3: Instantiate the docs operator

Create the appropriate docs operator (DbtDocsS3Operator, DbtDocsGCSOperator, or DbtDocsAzureStorageOperator) within a DAG. Configure the project_dir (path to the dbt project), profile_config, connection_id (cloud storage connection), bucket_name (target bucket), and install_deps flag.

Key considerations:

  • Each cloud provider has a dedicated operator class
  • project_dir must point to the dbt project accessible on the worker
  • bucket_name is the target cloud storage bucket for documentation artifacts
  • install_deps=True ensures dbt packages are installed before generating docs

Step 4: Generate documentation artifacts

At runtime, the operator executes dbt docs generate which produces three key artifacts: manifest.json (the dbt project graph metadata), catalog.json (database schema information including column details), and index.html (the static documentation site). These files are written to the dbt project's target/ directory.

Key considerations:

  • Documentation generation queries the database for catalog information
  • The process may be slow for large databases with many tables
  • manifest.json contains the full dbt project graph
  • catalog.json contains database schema introspection results

Step 5: Upload to cloud storage

After generation, the operator automatically uploads the documentation artifacts to the configured cloud storage bucket. The files are placed at a known path structure that the Airflow plugin expects. This upload happens within the same operator execution, combining generation and upload in one atomic step.

Key considerations:

  • All three artifacts (manifest.json, catalog.json, index.html) are uploaded together
  • The upload path must match what the Airflow plugin is configured to read
  • Existing files in the bucket are overwritten with the latest versions
  • Network connectivity to the cloud provider is required at runtime

Step 6: Configure Airflow plugin for hosting

The Cosmos Airflow plugin automatically adds a dbt Docs section to the Airflow UI. Configure the plugin by setting Airflow config values for dbt_docs_dir (local path) or dbt_docs_conn_id and dbt_docs_index_file_path (cloud storage path). The plugin serves the uploaded documentation through an iframe in the Airflow web interface.

Key considerations:

  • The plugin is version-aware and supports both Airflow 2.x and 3.x
  • For Airflow 2: uses Flask blueprints and AppBuilder views
  • For Airflow 3: uses FastAPI routes
  • The dbt_docs_conn_id must match the connection used for uploading

Execution Diagram

GitHub URL

Workflow Repository