Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Treeverse LakeFS Merge

From Leeroopedia


Knowledge Sources
Domains Data_Version_Control, Data_Engineering
Last Updated 2026-02-08 00:00 GMT

Overview

Merge in data version control combines changes from an isolated branch into a destination branch, enabling controlled promotion of data changes to production.

Description

The merge operation in lakeFS integrates changes from a source reference (typically a feature or experiment branch) into a destination branch. This is analogous to git merge in software version control. Merging is the primary mechanism for promoting validated data changes from isolated workspaces into shared branches.

When a merge is performed, lakeFS computes the common ancestor (merge base) of the source and destination, identifies the differences introduced by each side, and combines them into a new commit on the destination branch. If both sides have modified the same object in incompatible ways, a conflict arises that must be resolved.

Key aspects of the merge operation:

  • Three-way merge: lakeFS uses the merge base (common ancestor) to determine what changed on each side, reducing false conflicts.
  • Atomic operation: A merge either succeeds completely or fails entirely; there is no partial merge state.
  • Conflict resolution strategies: lakeFS supports automatic conflict resolution via source-wins (source changes take precedence) and dest-wins (destination changes take precedence) strategies.
  • Squash merge: Optionally, all source branch commits can be squashed into a single merge commit on the destination, simplifying the commit history.
  • Pre-merge hooks: Custom validation logic can be executed before the merge is finalized, enabling data quality gates at the promotion boundary.

Usage

Merge operations are used in the following data workflow scenarios:

  • Production promotion: After validating data on a feature branch, merge it into the main branch to make it available to production consumers.
  • Pipeline integration: Merge outputs from multiple pipeline stages into a consolidated branch.
  • A/B test resolution: After evaluating the results of parallel experiments, merge the winning branch into production.
  • Hotfix application: Apply urgent data corrections from a hotfix branch into both production and development branches.
  • Scheduled promotion: In batch-oriented workflows, periodically merge staging branches into production after automated quality checks pass.

Theoretical Basis

Three-way merge algorithm:

The merge process follows these steps:

  1. Compute merge base: Find the most recent common ancestor commit M of the source S and destination D.
  2. Compute source diff: Identify objects changed between M and S (additions, deletions, modifications).
  3. Compute destination diff: Identify objects changed between M and D.
  4. Combine changes:
    • Objects changed only in S: Apply source changes to the result.
    • Objects changed only in D: Retain destination changes in the result.
    • Objects changed in both S and D identically: Accept the common change.
    • Objects changed in both S and D differently: Conflict - requires resolution.

Conflict resolution:

When conflicts arise, lakeFS provides two automatic resolution strategies:

  • source-wins: The source branch's version of the conflicting object is used. This is appropriate when the source branch represents the authoritative or more recent data.
  • dest-wins: The destination branch's version is retained. This is appropriate when the destination branch should not be overwritten by incoming changes.

If no strategy is specified and conflicts exist, the merge operation fails with a 409 Conflict status, requiring manual resolution.

Squash merge:

A squash merge collapses all commits on the source branch into a single merge commit on the destination. This produces a cleaner commit history on the destination branch, at the cost of losing the granular commit history from the source branch. Squash merge is useful when the individual commits on the source branch represent intermediate, work-in-progress states that are not meaningful in the destination's history.

Pre-merge hooks:

lakeFS supports pre-merge hooks that execute before the merge commit is finalized. These hooks can:

  • Validate data quality constraints
  • Check schema compatibility
  • Enforce naming conventions or partition structure
  • Run automated tests against the merged data state

If a pre-merge hook fails, the merge is rejected with a 412 (Precondition Failed) status.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment