Principle:Astronomer Astronomer cosmos Graph Entity Model
| Knowledge Sources | |
|---|---|
| Domains | Data_Models, Graph |
| Last Updated | 2026-02-07 17:00 GMT |
Overview
A directed acyclic graph of typed entities that represents the logical structure of a data transformation project independently of any orchestration framework.
Description
The Graph Entity Model provides a uniform abstraction for describing the artefacts found inside a dbt project -- models, tests, seeds, snapshots, and sources -- as a graph of interconnected entities. Every entity derives from a common base class, CosmosEntity, which carries a unique identifier, a human-readable name, and a set of upstream dependencies expressed as edges in the graph.
Two concrete specialisations refine this base:
- Group -- a container entity that aggregates child entities. Groups map naturally to dbt directories, tags, or selector categories and translate into Airflow TaskGroups during rendering.
- Task -- a leaf entity that represents a single unit of work. Each Task stores an operator_class reference and an arguments dictionary, capturing everything needed to instantiate the corresponding Airflow operator at render time.
The graph is constructed during the parsing phase, where a dbt manifest or project directory is walked and each discovered artefact is converted into the appropriate entity type. Edges between entities mirror the ref() and source() relationships declared inside dbt SQL files, preserving the original dependency semantics.
A pivotal bridge function, get_airflow_task, takes a single Task entity and produces a fully configured Airflow operator instance. This function reads the operator_class stored on the entity, resolves it to the actual Python class, and passes the arguments dictionary through to the constructor. Because this translation is deferred until rendering, the entity graph itself remains runtime-agnostic and can be inspected, filtered, or rewritten before any Airflow objects are created.
Usage
Apply this principle whenever a dbt project must be mapped into an Airflow DAG or TaskGroup. Consumers first build the entity graph from a manifest or project directory, optionally prune or regroup nodes using selector logic, and then walk the graph to emit Airflow tasks via get_airflow_task. The same entity graph can also be serialised for caching or compared across runs to detect structural drift.
Theoretical Basis
The model draws on the Composite pattern from object-oriented design, where Group and Task share a common interface yet differ in their role as containers versus leaves. By expressing dependencies as graph edges rather than nested containment, the model also leverages classical DAG topological ordering to guarantee that rendering respects the correct execution sequence.
The separation between the entity graph and its Airflow materialisation follows the Builder pattern: the graph captures what needs to run and in which order, while get_airflow_task decides how each unit of work is realised. This decoupling means a change in the target orchestration layer -- for example swapping one operator implementation for another -- requires no modification to the graph itself.