Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Treeverse LakeFS Hook Triggered Commit

From Leeroopedia


Knowledge Sources
Domains Data_Quality, Automation
Last Updated 2026-02-08 00:00 GMT

Overview

Commit operations in lakeFS can trigger pre-commit validation hooks that act as automated quality gates, rejecting commits that fail to meet defined criteria.

Description

When action hooks are configured on a branch, the commit operation becomes a gated process. Before the commit is finalized, lakeFS evaluates all pre-commit hooks defined in the _lakefs_actions/ path on that branch. Each hook executes and returns a pass/fail result:

  • Webhook hooks -- lakeFS sends an HTTP POST to the configured URL. A 200-level response indicates success; any other response (4xx, 5xx, timeout) indicates failure.
  • Lua hooks -- The Lua script executes server-side. Returning without error indicates success; raising an error indicates failure.

If any pre-commit hook fails, the entire commit is rejected with an HTTP 412 Precondition Failed response. The staged changes remain intact but are not committed. This ensures that data quality violations are caught before they enter the commit history.

After a successful commit (all pre-commit hooks pass), post-commit hooks fire asynchronously. Post-commit hooks are used for notifications, logging, and triggering downstream pipelines. Their success or failure does not affect the commit itself.

Usage

Use hook-triggered commits when you need to:

  • Enforce data schemas -- Reject commits containing data that does not conform to expected schemas
  • Validate data quality -- Check for null values, referential integrity, row count thresholds, or data freshness
  • Enforce metadata policies -- Require specific commit metadata (e.g., ticket numbers, team identifiers)
  • Audit all changes -- Log every commit attempt for compliance reporting
  • Trigger ETL pipelines -- Kick off downstream processing after successful data commits

Theoretical Basis

The hook-triggered commit model implements a transaction validation pattern:

Pre-condition checking: Just as database triggers can enforce constraints before a transaction commits, pre-commit hooks enforce data quality constraints before a lakeFS commit is finalized. This is a form of optimistic concurrency -- the user stages changes freely, but the system validates at commit time.

All-or-nothing semantics: If any hook fails, the entire commit is rejected. There is no partial commit state. This mirrors the atomicity property from ACID transactions -- either all validations pass and the commit succeeds, or nothing changes.

Separation of concerns: The commit operation itself is decoupled from the validation logic. Data engineers write and test data; quality engineers define and maintain hooks. The lakeFS commit operation serves as the integration point where both concerns meet.

Fail-fast principle: Pre-commit hooks execute before the commit is written to storage. This prevents bad data from ever entering the commit history, which is far cheaper than detecting and remediating bad data after the fact.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment