Workflow:Treeverse LakeFS Write Audit Publish With Hooks
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Data_Quality, Data_Governance |
| Last Updated | 2026-02-08 10:00 GMT |
Overview
End-to-end process for implementing data quality gates using lakeFS branches and action hooks to ensure only validated data reaches production.
Description
This workflow implements the Write-Audit-Publish pattern for data lakes. Data is written to an isolated branch, automated quality checks run via pre-commit or pre-merge hooks (using Lua scripts or webhooks), and only data that passes validation is published (merged) to the production branch. lakeFS action hooks can enforce schema validation, file format checks, PII detection, and custom business rules. Failed hooks block the operation, preventing bad data from reaching downstream consumers.
Usage
Execute this workflow when you need to enforce data quality and governance policies before data reaches production consumers. Common triggers include: building a data pipeline that must pass quality gates, complying with data governance policies requiring validation before publication, preventing schema drift in production data, or implementing automated PII detection and removal before data is shared.
Execution Steps
Step 1: Define Action Hooks
Create action configuration files that define when hooks should fire and what validation they should perform. Actions are defined as YAML files stored in the repository under the _lakefs_actions/ prefix. Each action specifies an event trigger (pre-commit, pre-merge, etc.), the branches it applies to, and one or more hooks (Lua scripts or webhook endpoints).
Key considerations:
- Actions are defined as YAML configuration files
- Hooks can be Lua scripts (executed server-side) or webhooks (external HTTP endpoints)
- Supported events: pre-commit, post-commit, pre-merge, post-merge, pre-create-branch, pre-create-tag, and more
- Branch patterns control which branches trigger the hooks
Step 2: Upload Action Scripts
Upload the action YAML configuration and any associated Lua scripts to the repository. Lua hooks execute server-side and can validate file formats, check schemas, enforce naming conventions, or run custom business logic. Webhook hooks call external HTTP endpoints for more complex validation.
Key considerations:
- Action files must be placed under _lakefs_actions/ in the repository
- Lua scripts have access to lakeFS APIs for reading objects and metadata
- Webhook hooks receive event context (repository, branch, commit info) as JSON payloads
- Environment variables can be passed to hooks for configuration
Step 3: Write Data to Branch
Upload new or modified data objects to an isolated branch. This branch serves as a staging area where data changes accumulate before validation. Multiple files can be uploaded in a single session before committing.
Key considerations:
- Use a dedicated branch (not the production branch) for staging data
- All changes remain uncommitted and isolated until explicitly committed
- Multiple data producers can work on separate branches simultaneously
Step 4: Commit With Pre-Commit Validation
Attempt to commit the staged changes. If pre-commit hooks are configured, they execute automatically before the commit is finalized. The hooks validate the staged data according to the defined rules. If any hook fails, the commit is blocked and the data remains uncommitted.
Key considerations:
- Pre-commit hooks run synchronously — the commit waits for hook completion
- A failed hook blocks the entire commit operation
- Hook execution results (pass/fail, logs) are recorded in the action runs
- Multiple hooks can be chained — all must pass for the commit to succeed
Step 5: Review Hook Results
Inspect the action run results to understand which hooks passed or failed. The actions API provides detailed execution logs for each hook, including any error messages or validation failures. This information guides data corrections.
Key considerations:
- Action runs are queryable via the lakeFS API
- Each hook run includes status (completed, failed, skipped), duration, and logs
- Failed hooks include error details for debugging
- Post-commit hooks (if defined) run after successful commits for notifications or downstream triggers
Step 6: Merge to Production
After data passes all validation hooks, merge the staging branch into the production branch. Pre-merge hooks provide a second layer of validation at the merge boundary. Only data that passes both pre-commit and pre-merge checks reaches production.
Key considerations:
- Pre-merge hooks can enforce additional cross-branch validation
- The merge creates a commit on the production branch with full audit trail
- Post-merge hooks can trigger downstream notifications or pipeline runs
- Branch protection rules can restrict who can merge to production branches