Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:TobikoData Sqlmesh Audit Execution

From Leeroopedia
Revision as of 17:11, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/TobikoData_Sqlmesh_Audit_Execution.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Data_Engineering, Data_Quality, Model_Development
Last Updated 2026-02-07 00:00 GMT

Overview

A systematic approach to executing data quality checks against model outputs to detect anomalies, constraint violations, and business rule failures.

Description

Data quality cannot be assumed—it must be verified. Even well-designed transformation logic can produce incorrect results due to upstream data issues, edge cases not covered in testing, or subtle bugs in complex SQL. Production data pipelines need automated quality checks that run continuously, catching problems before they propagate downstream or affect business decisions.

Audits provide this verification layer through executable data quality rules. Unlike unit tests that validate logic with fixed inputs, audits run against real production data within specified time ranges. They can check for null values in critical columns, verify row counts fall within expected ranges, ensure referential integrity across tables, validate business metrics against thresholds, and detect anomalies using statistical methods. Audits are defined alongside models, making quality requirements explicit and version-controlled.

The audit execution framework handles orchestration complexity. It determines which audits apply to which models, resolves dependencies to ensure audited data is available, executes audit queries against the database, collects and aggregates results, and reports failures with sufficient context for debugging. Audits can be run on-demand during development, automatically during deployment, or on a schedule for continuous monitoring. This flexibility enables both preventive quality checks (before deployment) and detective monitoring (after deployment).

Usage

Use audit execution during development to validate that changes maintain data quality, in deployment pipelines to prevent bad data from reaching production, and on production schedules to continuously monitor data quality. Define audits for critical business metrics, data integrity constraints, and known failure modes from past incidents.

Theoretical Basis

The audit execution algorithm orchestrates quality checks across models:

FUNCTION execute_audits(start, end, models, execution_time):
    # Resolve which models to audit
    IF models IS NULL THEN
        target_snapshots = all_snapshots_in_environment()
    ELSE
        target_snapshots = [get_snapshot(m) FOR m IN models]
    END IF

    # Collect all audits to execute
    audit_plan = []
    FOR EACH snapshot IN target_snapshots:
        FOR EACH audit IN snapshot.audits:
            audit_plan.append({
                snapshot: snapshot,
                audit: audit,
                start: start,
                end: end
            })
        END FOR
    END FOR

    # Execute each audit
    results = []
    FOR EACH plan_item IN audit_plan:
        # Resolve audit query with model context
        audit_query = render_audit_query(
            plan_item.audit,
            plan_item.snapshot,
            plan_item.start,
            plan_item.end,
            execution_time
        )

        # Check if audit should be skipped
        IF should_skip_audit(plan_item, snapshot_intervals) THEN
            results.append({
                audit: plan_item.audit,
                model: plan_item.snapshot.name,
                status: SKIPPED
            })
            CONTINUE
        END IF

        # Execute audit query
        TRY:
            audit_result = execute_query(audit_query)

            # Evaluate audit predicate
            IF audit_predicate_passes(audit_result, plan_item.audit) THEN
                results.append({
                    audit: plan_item.audit,
                    model: plan_item.snapshot.name,
                    status: PASSED
                })
            ELSE
                results.append({
                    audit: plan_item.audit,
                    model: plan_item.snapshot.name,
                    status: FAILED,
                    details: audit_result
                })
            END IF
        CATCH error:
            results.append({
                audit: plan_item.audit,
                model: plan_item.snapshot.name,
                status: ERROR,
                error: error
            })
        END TRY
    END FOR

    # Report results
    failures = [r FOR r IN results IF r.status = FAILED OR r.status = ERROR]
    log_audit_summary(results)

    RETURN failures.is_empty()
END FUNCTION

The key insight is treating audits as first-class queries that execute in the same context as models, with access to time ranges, execution time, and snapshot references, but with pass/fail evaluation semantics rather than data output.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment