Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Treeverse LakeFS DiffRefs

From Leeroopedia


Knowledge Sources
Domains Data_Version_Control, REST_API
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for comparing data states between two references in a lakeFS repository provided by the lakeFS REST API.

Description

The diffRefs endpoint computes the differences between two references (branches, commits, or tags) within a repository. It returns a paginated list of objects that have been added, removed, changed, or are in conflict between the left and right references. The endpoint supports both two-dot (direct comparison) and three-dot (merge-base comparison) diff types, with the three-dot diff being the default.

Usage

Use this API when:

  • Reviewing changes on a feature branch before merging into production.
  • Auditing data differences between two pipeline runs (identified by commit IDs).
  • Detecting conflicts before performing a merge operation.
  • Building automated data quality checks that inspect diff results for unexpected changes.

Code Reference

Source Location

  • Repository: lakeFS
  • File: api/swagger.yml (lines 4928-4949)

Signature

/repositories/{repository}/refs/{leftRef}/diff/{rightRef}:
  get:
    operationId: diffRefs
    summary: diff references
    parameters:
      - in: path
        name: repository
        required: true
        schema:
          type: string
      - in: path
        name: leftRef
        required: true
        schema:
          type: string
      - in: path
        name: rightRef
        required: true
        schema:
          type: string
      - in: query
        name: type
        schema:
          type: string
          enum: [two_dot, three_dot]
          default: three_dot
      - in: query
        name: after
        schema:
          type: string
      - in: query
        name: amount
        schema:
          type: integer
          default: 100
      - in: query
        name: prefix
        schema:
          type: string
      - in: query
        name: delimiter
        schema:
          type: string
      - in: query
        name: include_right_stats
        schema:
          type: boolean
        description: (experimental) include size statistics for right-side objects
    responses:
      200:
        description: diff list
        content:
          application/json:
            schema:
              $ref: "#/components/schemas/DiffList"

Import

import lakefs

client = lakefs.Client(
    host="http://localhost:8000",
    username="access_key_id",
    password="secret_access_key"
)
repo = lakefs.Repository("my-repo", client=client)
main_branch = repo.branch("main")
feature_branch = repo.branch("experiment-v2")
diffs = feature_branch.diff(other_ref="main")

I/O Contract

Inputs

Name Type Required Description
repository (path param) string Yes Repository name.
leftRef (path param) string Yes Left reference (branch name, commit ID, or tag) for comparison.
rightRef (path param) string Yes Right reference (branch name, commit ID, or tag) for comparison.
type (query param) string (enum) No Diff type: two_dot or three_dot. Defaults to three_dot.
after (query param) string No Pagination cursor: return results after this path.
amount (query param) integer No Number of results per page. Defaults to 100.
prefix (query param) string No Filter results to paths starting with this prefix.
delimiter (query param) string No Delimiter for grouping paths (e.g., / for directory-level grouping).
include_right_stats (query param) boolean No (Experimental) Include size statistics for right-side objects.

Outputs

Name Type Description
pagination object Pagination info: has_more (boolean), next_offset (string), results (integer), max_per_page (integer).
results list[Diff] List of diff entries, each containing: type (added/removed/changed/conflict), path (string), path_type (string), size_bytes (integer).

Usage Examples

Diff Two Branches Using the Python SDK

import lakefs

client = lakefs.Client(
    host="http://localhost:8000",
    username="AKIAIOSFODNN7EXAMPLE",
    password="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
)

repo = lakefs.Repository("my-data-repo", client=client)

# Compare experiment branch against main (three-dot diff by default)
for diff in repo.ref("experiment-v2").diff(other_ref="main"):
    print(f"{diff.type}: {diff.path} ({diff.size_bytes} bytes)")

Diff Using curl

# Three-dot diff (default): show changes introduced on experiment-v2 since diverging from main
curl -X GET "http://localhost:8000/api/v1/repositories/my-data-repo/refs/main/diff/experiment-v2" \
  -u "AKIAIOSFODNN7EXAMPLE:wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"

Filtered Diff With Pagination

# Two-dot diff filtered to a specific prefix with pagination
curl -X GET "http://localhost:8000/api/v1/repositories/my-data-repo/refs/main/diff/experiment-v2?type=two_dot&prefix=data/customers/&amount=50" \
  -u "AKIAIOSFODNN7EXAMPLE:wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"

Related Pages

Implements Principle


Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment