Implementation:Lance format Lance LanceTableProvider
| Knowledge Sources | |
|---|---|
| Domains | DataFusion, Infrastructure |
| Last Updated | 2026-02-08 19:33 GMT |
Overview
Description
The LanceTableProvider module provides two DataFusion TableProvider implementations for Lance datasets:
1. Direct impl TableProvider for Dataset (in logical_plan.rs):
A straightforward implementation that exposes a Lance Dataset directly as a DataFusion table. It supports projection pushdown and limit pushdown but does not support filter pushdown. This is useful for simple SQL queries over Lance datasets.
2. LanceTableProvider struct (in dataframe.rs):
A more full-featured table provider that additionally supports:
- Filter pushdown -- All filters are reported as exactly applicable
- System columns -- Optional inclusion of
_rowidand_rowaddrcolumns in the schema - Ordered/unordered scans -- Configurable scan ordering
- Limit pushdown -- Passed through to the Lance scanner
The module also provides the SessionContextExt trait that adds convenience methods to DataFusion's SessionContext:
read_lance-- Creates a DataFrame for an ordered Lance dataset scanread_lance_unordered-- Creates a DataFrame for an unordered scanread_one_shot-- Creates a DataFrame from aSendableRecordBatchStream
OneShotPartitionStream is a helper that wraps a SendableRecordBatchStream as a DataFusion PartitionStream that can only be consumed once.
Usage
These providers enable using SQL queries and DataFusion's query planning engine over Lance datasets. They are registered with a DataFusion SessionContext to make Lance tables accessible via SQL.
Code Reference
Source Location
rust/lance/src/datafusion/logical_plan.rs-- DirectDatasetasTableProviderrust/lance/src/datafusion/dataframe.rs--LanceTableProviderstruct andSessionContextExt
Signature
// logical_plan.rs
#[async_trait]
impl TableProvider for Dataset { /* ... */ }
// dataframe.rs
#[derive(Debug)]
pub struct LanceTableProvider {
dataset: Arc<Dataset>,
full_schema: Arc<Schema>,
row_id_idx: Option<usize>,
row_addr_idx: Option<usize>,
ordered: bool,
}
impl LanceTableProvider {
pub fn new(dataset: Arc<Dataset>, with_row_id: bool, with_row_addr: bool) -> Self;
pub fn new_with_ordering(
dataset: Arc<Dataset>, with_row_id: bool, with_row_addr: bool, ordered: bool,
) -> Self;
pub fn dataset(&self) -> Arc<Dataset>;
}
pub trait SessionContextExt {
fn read_lance(&self, dataset: Arc<Dataset>, with_row_id: bool, with_row_addr: bool)
-> datafusion::common::Result<DataFrame>;
fn read_lance_unordered(&self, dataset: Arc<Dataset>, with_row_id: bool, with_row_addr: bool)
-> datafusion::common::Result<DataFrame>;
fn read_one_shot(&self, data: SendableRecordBatchStream)
-> datafusion::common::Result<DataFrame>;
}
Import
use lance::datafusion::{LanceTableProvider, SessionContextExt};
I/O Contract
Inputs
| Parameter | Type | Description |
|---|---|---|
| dataset | Arc<Dataset> |
The Lance dataset to expose as a DataFusion table |
| with_row_id | bool |
Whether to include the _rowid system column
|
| with_row_addr | bool |
Whether to include the _rowaddr system column
|
| ordered | bool |
Whether to return results in deterministic order (default: true) |
| projection | Option<&Vec<usize>> |
Column indices to project (from DataFusion) |
| filters | &[Expr] |
DataFusion filter expressions for pushdown |
| limit | Option<usize> |
Maximum number of rows to return |
Outputs
| Type | Description |
|---|---|
Arc<dyn ExecutionPlan> |
A DataFusion execution plan that scans the Lance dataset |
SchemaRef |
The schema of the table (including any requested system columns) |
DataFrame |
A DataFusion DataFrame for further query composition (via SessionContextExt)
|
Usage Examples
use lance::datafusion::{LanceTableProvider, SessionContextExt};
use lance::Dataset;
use datafusion::prelude::SessionContext;
use std::sync::Arc;
// Register a Lance dataset as a DataFusion table
let dataset = Dataset::open("/path/to/data.lance").await?;
let ctx = SessionContext::new();
ctx.register_table(
"my_table",
Arc::new(LanceTableProvider::new(Arc::new(dataset), true, false)),
)?;
// Query with SQL
let df = ctx.sql("SELECT * FROM my_table WHERE id > 100 LIMIT 10").await?;
let results = df.collect().await?;
// Or use the convenience extension
let dataset = Dataset::open("/path/to/data.lance").await?;
let df = ctx.read_lance(Arc::new(dataset), false, false)?;
Related Pages
- Lance_format_Lance_CrateRoot -- Main crate re-exporting the DataFusion module
- Lance_format_Lance_LanceDataFrame -- Extended DataFrame with additional context
- Lance_format_Lance_LqCli -- CLI tool that also reads Lance datasets