Implementation:Treeverse LakeFS DiffRefs
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Data_Version_Control, REST_API |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for comparing data states between two references in a lakeFS repository provided by the lakeFS REST API.
Description
The diffRefs endpoint computes the differences between two references (branches, commits, or tags) within a repository. It returns a paginated list of objects that have been added, removed, changed, or are in conflict between the left and right references. The endpoint supports both two-dot (direct comparison) and three-dot (merge-base comparison) diff types, with the three-dot diff being the default.
Usage
Use this API when:
- Reviewing changes on a feature branch before merging into production.
- Auditing data differences between two pipeline runs (identified by commit IDs).
- Detecting conflicts before performing a merge operation.
- Building automated data quality checks that inspect diff results for unexpected changes.
Code Reference
Source Location
- Repository: lakeFS
- File: api/swagger.yml (lines 4928-4949)
Signature
/repositories/{repository}/refs/{leftRef}/diff/{rightRef}:
get:
operationId: diffRefs
summary: diff references
parameters:
- in: path
name: repository
required: true
schema:
type: string
- in: path
name: leftRef
required: true
schema:
type: string
- in: path
name: rightRef
required: true
schema:
type: string
- in: query
name: type
schema:
type: string
enum: [two_dot, three_dot]
default: three_dot
- in: query
name: after
schema:
type: string
- in: query
name: amount
schema:
type: integer
default: 100
- in: query
name: prefix
schema:
type: string
- in: query
name: delimiter
schema:
type: string
- in: query
name: include_right_stats
schema:
type: boolean
description: (experimental) include size statistics for right-side objects
responses:
200:
description: diff list
content:
application/json:
schema:
$ref: "#/components/schemas/DiffList"
Import
import lakefs
client = lakefs.Client(
host="http://localhost:8000",
username="access_key_id",
password="secret_access_key"
)
repo = lakefs.Repository("my-repo", client=client)
main_branch = repo.branch("main")
feature_branch = repo.branch("experiment-v2")
diffs = feature_branch.diff(other_ref="main")
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| repository (path param) | string | Yes | Repository name. |
| leftRef (path param) | string | Yes | Left reference (branch name, commit ID, or tag) for comparison. |
| rightRef (path param) | string | Yes | Right reference (branch name, commit ID, or tag) for comparison. |
| type (query param) | string (enum) | No | Diff type: two_dot or three_dot. Defaults to three_dot.
|
| after (query param) | string | No | Pagination cursor: return results after this path. |
| amount (query param) | integer | No | Number of results per page. Defaults to 100.
|
| prefix (query param) | string | No | Filter results to paths starting with this prefix. |
| delimiter (query param) | string | No | Delimiter for grouping paths (e.g., / for directory-level grouping).
|
| include_right_stats (query param) | boolean | No | (Experimental) Include size statistics for right-side objects. |
Outputs
| Name | Type | Description |
|---|---|---|
| pagination | object | Pagination info: has_more (boolean), next_offset (string), results (integer), max_per_page (integer).
|
| results | list[Diff] | List of diff entries, each containing: type (added/removed/changed/conflict), path (string), path_type (string), size_bytes (integer).
|
Usage Examples
Diff Two Branches Using the Python SDK
import lakefs
client = lakefs.Client(
host="http://localhost:8000",
username="AKIAIOSFODNN7EXAMPLE",
password="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
)
repo = lakefs.Repository("my-data-repo", client=client)
# Compare experiment branch against main (three-dot diff by default)
for diff in repo.ref("experiment-v2").diff(other_ref="main"):
print(f"{diff.type}: {diff.path} ({diff.size_bytes} bytes)")
Diff Using curl
# Three-dot diff (default): show changes introduced on experiment-v2 since diverging from main
curl -X GET "http://localhost:8000/api/v1/repositories/my-data-repo/refs/main/diff/experiment-v2" \
-u "AKIAIOSFODNN7EXAMPLE:wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
Filtered Diff With Pagination
# Two-dot diff filtered to a specific prefix with pagination
curl -X GET "http://localhost:8000/api/v1/repositories/my-data-repo/refs/main/diff/experiment-v2?type=two_dot&prefix=data/customers/&amount=50" \
-u "AKIAIOSFODNN7EXAMPLE:wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment