Principle:Lance format Lance Version Restoration
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Version_Control |
| Last Updated | 2026-02-08 19:00 GMT |
Overview
Version restoration is the operation that promotes a previously committed version of a Lance dataset to become the new latest version, effectively rolling back all changes made after that version.
Description
Unlike checkout, which returns a read-only view of a past version, restoration is a write operation. It creates a new version whose content is identical to the restored version. This means the version history is append-only: restoring version 5 when the latest is version 10 produces version 11 whose data matches version 5. Versions 6 through 10 remain in history and can still be inspected or restored themselves.
The restoration workflow is a two-step process from the user's perspective:
- Checkout the version to restore using
checkout_version(). - Restore by calling
restore()on the checked-out dataset.
Internally, restore() creates a Transaction with Operation::Restore { version }, where version is the version the dataset is currently checked out at. The transaction's read_version is set to the latest version, ensuring the restore participates in the normal conflict-resolution protocol.
Usage
Use restoration when:
- Rolling back a bad write -- a data pipeline ingested corrupt data and you need to revert to the last known good state.
- Undoing schema changes -- a schema migration went wrong and you need to return to the previous schema.
- Recovering from accidental deletes -- rows were deleted by mistake and you need to bring them back.
- Resetting to a baseline -- you want the "current" state of the dataset to match a specific historical version for a new round of experimentation.
Theoretical Basis
Restore as a Forward Operation
Restoration in Lance is not a destructive rewind. It follows the same commit protocol as any other write:
- The old manifest for the target version is read from storage.
- The manifest's timestamp is updated to the current time.
- The
max_fragment_idis adjusted to be at least as large as the current manifest'smax_fragment_id, preventing fragment ID collisions in future writes. - The modified manifest is committed as a new version (
latest + 1) through the standardCommitHandler::commit()path.
This approach preserves the invariant that version numbers are strictly increasing and that no existing version is ever mutated or deleted by a restore operation.
Conflict Handling
Because restore uses the standard commit transaction path, it is subject to the same optimistic concurrency control as other writes. If another writer commits between the time the restore reads the latest version and attempts to write, the transaction will be retried. The Operation::Restore variant is handled specially during rebase: it is generally compatible with other operations because it replaces the entire dataset state.
Pseudocode
function restore(dataset):
// dataset is checked out at the version to restore
latest_version = fetch_latest_manifest().version
transaction = Transaction {
read_version: latest_version,
operation: Restore { version: dataset.version },
}
new_manifest = restore_old_manifest(dataset.version)
new_manifest.version = latest_version + 1
new_manifest.timestamp = now()
new_manifest.max_fragment_id = max(
new_manifest.max_fragment_id,
latest_manifest.max_fragment_id
)
commit(new_manifest)