Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Lance format Lance UdfRegistration

From Leeroopedia
Revision as of 15:29, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Lance_format_Lance_UdfRegistration.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains DataFusion_Integration, Query_Execution
Last Updated 2026-02-08 19:33 GMT

Overview

The UdfRegistration module registers all Lance-specific user-defined functions (UDFs) -- including text search, JSON operations, and geospatial functions -- with a DataFusion SessionContext.

Description

This module serves as the central registration point for all custom UDFs that Lance adds to DataFusion's query engine. Key components include:

  • register_functions -- The main public function that registers all Lance UDFs with a given SessionContext. It registers:
    • contains_tokens -- A text search UDF that checks whether a string contains all specified tokens (separated by punctuation and whitespace). Functionally equivalent to a full-text search MatchQuery with the simple tokenizer and AND operator.
    • All JSON UDFs from the json submodule: json_extract, json_extract_with_type, json_exists, json_get, json_get_string, json_get_int, json_get_float, json_get_bool, json_array_contains, json_array_length.
    • Geospatial functions via lance_geo::register_functions.
  • CONTAINS_TOKENS_UDF -- A lazily initialized static ScalarUDF for the contains_tokens function.
  • contains_tokens -- Internally implemented as a scalar function that:
    • Tokenizes the search query by splitting on non-alphanumeric characters.
    • Tokenizes each row's text value the same way.
    • Returns true only if all query tokens appear in the text.
  • collect_tokens -- A helper function that splits text on punctuation and whitespace boundaries, returning a vector of alphanumeric token strings.

Usage

This function is called automatically when creating a Lance session context via get_session_context or new_session_context. You can also call it directly to add Lance UDFs to a custom SessionContext.

Code Reference

Source Location

rust/lance-datafusion/src/udf.rs

Signature

pub fn register_functions(ctx: &SessionContext)

pub static CONTAINS_TOKENS_UDF: LazyLock<ScalarUDF>;

Import

use lance_datafusion::udf::register_functions;

I/O Contract

Input Type Description
ctx &SessionContext The DataFusion session context to register UDFs with
Output Type Description
(side effect) -- All Lance UDFs are registered in the session context and available for use in SQL queries

Usage Examples

use datafusion::prelude::SessionContext;
use lance_datafusion::udf::register_functions;

let ctx = SessionContext::new();
register_functions(&ctx);

// Now you can use Lance UDFs in SQL
let df = ctx.sql("SELECT * FROM my_table WHERE contains_tokens(text_col, 'fox jumps dog')").await?;

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment