Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Lance format Lance LanceCrateRoot

From Leeroopedia


Knowledge Sources
Domains Core, Infrastructure
Last Updated 2026-02-08 19:33 GMT

Overview

Description

The CrateRoot module (lib.rs) is the top-level entry point for the lance crate, the main Lance library. It provides:

  • Crate documentation with examples for creating and scanning Lance datasets
  • Module declarations exposing the public API surface: arrow, blob, datafusion, dataset, index, io, session, table, utils
  • Key re-exports:
    • lance_core::datatypes, lance_core::Error, lance_core::Result
    • blob_field and BlobArrayBuilder from the blob module
    • Dataset from the dataset module
  • Convenience function open_dataset for loading datasets from a URI
  • DIST_FIELD -- A lazily initialized Arrow field for distance column metadata
  • deps module -- Re-exports of arrow_array, arrow_schema, and datafusion to pin dependency versions for downstream users

The crate is described as providing 100x faster random access compared to Parquet, with automatic versioning, Apache Arrow and DuckDB compatibility, and optimizations for computer vision, bioinformatics, spatial, and ML data.

Usage

This is the primary entry point for Rust consumers of the Lance format. Users typically start by calling Dataset::open or open_dataset and then use the scanner, index, or write APIs.

Code Reference

Source Location

rust/lance/src/lib.rs

Signature

pub use lance_core::datatypes;
pub use lance_core::{Error, Result};
pub use blob::{blob_field, BlobArrayBuilder};
pub use dataset::Dataset;

pub async fn open_dataset<T: AsRef<str>>(table_uri: T) -> Result<Dataset>;

pub static DIST_FIELD: LazyLock<arrow_schema::Field>;

pub mod deps {
    pub use arrow_array;
    pub use arrow_schema;
    pub use datafusion;
}

Import

use lance::{Dataset, open_dataset, Error, Result};
use lance::blob::{blob_field, BlobArrayBuilder};
use lance::deps::{arrow_array, arrow_schema};

I/O Contract

Inputs

Parameter Type Description
table_uri T: AsRef<str> URI or file path to a Lance dataset (supports local, S3, GCS, Azure)

Outputs

Type Description
Result<Dataset> An opened Lance Dataset ready for scanning, indexing, or writing
DIST_FIELD A static Field::new("_distance", Float32, true) for distance column metadata

Usage Examples

use std::sync::Arc;
use arrow_array::{RecordBatch, RecordBatchIterator};
use arrow_schema::{Schema, Field, DataType};
use lance::{Dataset, dataset::WriteParams};

// Create a dataset
let schema = Arc::new(Schema::new(vec![
    Field::new("id", DataType::Int64, false),
]));
let batches = vec![RecordBatch::new_empty(schema.clone())];
let reader = RecordBatchIterator::new(batches.into_iter().map(Ok), schema);
Dataset::write(reader, "/tmp/my_dataset.lance", Some(WriteParams::default()))
    .await
    .unwrap();

// Open and scan
let dataset = lance::open_dataset("/tmp/my_dataset.lance").await.unwrap();
let mut scanner = dataset.scan();

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment