Implementation:Lance format Lance UdfRegistration
| Knowledge Sources | |
|---|---|
| Domains | DataFusion_Integration, Query_Execution |
| Last Updated | 2026-02-08 19:33 GMT |
Overview
The UdfRegistration module registers all Lance-specific user-defined functions (UDFs) -- including text search, JSON operations, and geospatial functions -- with a DataFusion SessionContext.
Description
This module serves as the central registration point for all custom UDFs that Lance adds to DataFusion's query engine. Key components include:
- register_functions -- The main public function that registers all Lance UDFs with a given
SessionContext. It registers:contains_tokens-- A text search UDF that checks whether a string contains all specified tokens (separated by punctuation and whitespace). Functionally equivalent to a full-text search MatchQuery with the simple tokenizer and AND operator.- All JSON UDFs from the
jsonsubmodule:json_extract,json_extract_with_type,json_exists,json_get,json_get_string,json_get_int,json_get_float,json_get_bool,json_array_contains,json_array_length. - Geospatial functions via
lance_geo::register_functions.
- CONTAINS_TOKENS_UDF -- A lazily initialized static
ScalarUDFfor thecontains_tokensfunction.
- contains_tokens -- Internally implemented as a scalar function that:
- Tokenizes the search query by splitting on non-alphanumeric characters.
- Tokenizes each row's text value the same way.
- Returns
trueonly if all query tokens appear in the text.
- collect_tokens -- A helper function that splits text on punctuation and whitespace boundaries, returning a vector of alphanumeric token strings.
Usage
This function is called automatically when creating a Lance session context via get_session_context or new_session_context. You can also call it directly to add Lance UDFs to a custom SessionContext.
Code Reference
Source Location
rust/lance-datafusion/src/udf.rs
Signature
pub fn register_functions(ctx: &SessionContext)
pub static CONTAINS_TOKENS_UDF: LazyLock<ScalarUDF>;
Import
use lance_datafusion::udf::register_functions;
I/O Contract
| Input | Type | Description |
|---|---|---|
| ctx | &SessionContext |
The DataFusion session context to register UDFs with |
| Output | Type | Description |
|---|---|---|
| (side effect) | -- | All Lance UDFs are registered in the session context and available for use in SQL queries |
Usage Examples
use datafusion::prelude::SessionContext;
use lance_datafusion::udf::register_functions;
let ctx = SessionContext::new();
register_functions(&ctx);
// Now you can use Lance UDFs in SQL
let df = ctx.sql("SELECT * FROM my_table WHERE contains_tokens(text_col, 'fox jumps dog')").await?;
Related Pages
- Lance_format_Lance_JsonUdf -- JSON UDF implementations registered by this module
- Lance_format_Lance_ExecPlans -- Session context creation that calls register_functions
- Lance_format_Lance_FilterPlanner -- Planner that uses the registered UDFs for expression parsing