Principle:Treeverse LakeFS Merge
| Knowledge Sources | |
|---|---|
| Domains | Data_Version_Control, Data_Engineering |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Merge in data version control combines changes from an isolated branch into a destination branch, enabling controlled promotion of data changes to production.
Description
The merge operation in lakeFS integrates changes from a source reference (typically a feature or experiment branch) into a destination branch. This is analogous to git merge in software version control. Merging is the primary mechanism for promoting validated data changes from isolated workspaces into shared branches.
When a merge is performed, lakeFS computes the common ancestor (merge base) of the source and destination, identifies the differences introduced by each side, and combines them into a new commit on the destination branch. If both sides have modified the same object in incompatible ways, a conflict arises that must be resolved.
Key aspects of the merge operation:
- Three-way merge: lakeFS uses the merge base (common ancestor) to determine what changed on each side, reducing false conflicts.
- Atomic operation: A merge either succeeds completely or fails entirely; there is no partial merge state.
- Conflict resolution strategies: lakeFS supports automatic conflict resolution via
source-wins(source changes take precedence) anddest-wins(destination changes take precedence) strategies. - Squash merge: Optionally, all source branch commits can be squashed into a single merge commit on the destination, simplifying the commit history.
- Pre-merge hooks: Custom validation logic can be executed before the merge is finalized, enabling data quality gates at the promotion boundary.
Usage
Merge operations are used in the following data workflow scenarios:
- Production promotion: After validating data on a feature branch, merge it into the main branch to make it available to production consumers.
- Pipeline integration: Merge outputs from multiple pipeline stages into a consolidated branch.
- A/B test resolution: After evaluating the results of parallel experiments, merge the winning branch into production.
- Hotfix application: Apply urgent data corrections from a hotfix branch into both production and development branches.
- Scheduled promotion: In batch-oriented workflows, periodically merge staging branches into production after automated quality checks pass.
Theoretical Basis
Three-way merge algorithm:
The merge process follows these steps:
- Compute merge base: Find the most recent common ancestor commit M of the source S and destination D.
- Compute source diff: Identify objects changed between M and S (additions, deletions, modifications).
- Compute destination diff: Identify objects changed between M and D.
- Combine changes:
- Objects changed only in S: Apply source changes to the result.
- Objects changed only in D: Retain destination changes in the result.
- Objects changed in both S and D identically: Accept the common change.
- Objects changed in both S and D differently: Conflict - requires resolution.
Conflict resolution:
When conflicts arise, lakeFS provides two automatic resolution strategies:
- source-wins: The source branch's version of the conflicting object is used. This is appropriate when the source branch represents the authoritative or more recent data.
- dest-wins: The destination branch's version is retained. This is appropriate when the destination branch should not be overwritten by incoming changes.
If no strategy is specified and conflicts exist, the merge operation fails with a 409 Conflict status, requiring manual resolution.
Squash merge:
A squash merge collapses all commits on the source branch into a single merge commit on the destination. This produces a cleaner commit history on the destination branch, at the cost of losing the granular commit history from the source branch. Squash merge is useful when the individual commits on the source branch represent intermediate, work-in-progress states that are not meaningful in the destination's history.
Pre-merge hooks:
lakeFS supports pre-merge hooks that execute before the merge commit is finalized. These hooks can:
- Validate data quality constraints
- Check schema compatibility
- Enforce naming conventions or partition structure
- Run automated tests against the merged data state
If a pre-merge hook fails, the merge is rejected with a 412 (Precondition Failed) status.