Workflow:Astronomer Astronomer cosmos Dbt docs generation
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, dbt, Airflow, Documentation |
| Last Updated | 2026-02-07 17:00 GMT |
Overview
End-to-end process for generating dbt documentation using Cosmos operators and uploading the artifacts to cloud storage (S3, GCS, Azure Blob) for hosting through the Airflow UI dbt docs plugin.
Description
This workflow covers the generation and hosting of dbt documentation using Cosmos's specialized documentation operators. Cosmos provides DbtDocsS3Operator, DbtDocsGCSOperator, and DbtDocsAzureStorageOperator that combine dbt docs generation with cloud storage upload in a single operator. These operators run dbt docs generate to produce the documentation artifacts (manifest.json, catalog.json, index.html), then upload them to the configured cloud storage bucket.
The uploaded documentation can be served through Cosmos's Airflow plugin, which adds a Cosmos dbt Docs menu item to the Airflow UI. The plugin renders the dbt documentation in an iframe, supporting both Airflow 2 (Flask-based) and Airflow 3 (FastAPI-based) interfaces. Storage type is auto-detected from the connection, and the plugin retrieves artifacts from the configured cloud bucket.
Usage
Execute this workflow when you need to generate and publish dbt project documentation for your team. This is typically run on a schedule (e.g., daily or after each dbt run) to keep documentation in sync with the current state of the dbt project. The resulting docs are accessible to all Airflow users through the integrated Airflow UI plugin, providing a centralized location for data model documentation, lineage visualization, and test coverage reporting.
Execution Steps
Step 1: Configure profile for documentation
Create a ProfileConfig with a profile mapping that connects to the target database. The dbt docs generate command needs database access to compile the documentation catalog, which includes column-level metadata, row counts, and table descriptions extracted from the database's information schema.
Key considerations:
- The profile must have read access to the database information schema
- Use the same profile configuration as your dbt run tasks for consistency
- The connection must be valid at runtime when the docs generation runs
Step 2: Set up cloud storage connection
Configure an Airflow connection for the target cloud storage provider (AWS S3, Google Cloud Storage, or Azure Blob Storage). The connection provides authentication credentials for uploading documentation artifacts. The connection ID is passed to the docs operator.
Key considerations:
- For S3: use an aws_default or custom AWS connection with S3 write permissions
- For GCS: use a google_cloud_default or custom GCP connection
- For Azure: use a wasb_default or custom Azure connection
- The target bucket must exist and the credentials must have write access
Step 3: Instantiate the docs operator
Create the appropriate docs operator (DbtDocsS3Operator, DbtDocsGCSOperator, or DbtDocsAzureStorageOperator) within a DAG. Configure the project_dir (path to the dbt project), profile_config, connection_id (cloud storage connection), bucket_name (target bucket), and install_deps flag.
Key considerations:
- Each cloud provider has a dedicated operator class
- project_dir must point to the dbt project accessible on the worker
- bucket_name is the target cloud storage bucket for documentation artifacts
- install_deps=True ensures dbt packages are installed before generating docs
Step 4: Generate documentation artifacts
At runtime, the operator executes dbt docs generate which produces three key artifacts: manifest.json (the dbt project graph metadata), catalog.json (database schema information including column details), and index.html (the static documentation site). These files are written to the dbt project's target/ directory.
Key considerations:
- Documentation generation queries the database for catalog information
- The process may be slow for large databases with many tables
- manifest.json contains the full dbt project graph
- catalog.json contains database schema introspection results
Step 5: Upload to cloud storage
After generation, the operator automatically uploads the documentation artifacts to the configured cloud storage bucket. The files are placed at a known path structure that the Airflow plugin expects. This upload happens within the same operator execution, combining generation and upload in one atomic step.
Key considerations:
- All three artifacts (manifest.json, catalog.json, index.html) are uploaded together
- The upload path must match what the Airflow plugin is configured to read
- Existing files in the bucket are overwritten with the latest versions
- Network connectivity to the cloud provider is required at runtime
Step 6: Configure Airflow plugin for hosting
The Cosmos Airflow plugin automatically adds a dbt Docs section to the Airflow UI. Configure the plugin by setting Airflow config values for dbt_docs_dir (local path) or dbt_docs_conn_id and dbt_docs_index_file_path (cloud storage path). The plugin serves the uploaded documentation through an iframe in the Airflow web interface.
Key considerations:
- The plugin is version-aware and supports both Airflow 2.x and 3.x
- For Airflow 2: uses Flask blueprints and AppBuilder views
- For Airflow 3: uses FastAPI routes
- The dbt_docs_conn_id must match the connection used for uploading