Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Neuml Txtai External Vectors

From Leeroopedia


Knowledge Sources
Domains Embeddings, Vectors, Custom Integration
Last Updated 2026-02-10 01:00 GMT

Overview

Concrete tool for building embedding vectors using an external transform method provided by txtai.

Description

The External class extends the base Vectors class to support building embedding vectors via a user-supplied external function or API call. Rather than loading a specific model, it resolves a configurable transform function that handles the actual embedding generation.

The transform function resolution supports multiple input types:

  • String: Resolved via txtai's Resolver utility, which can import and instantiate classes or functions from module paths.
  • Function: Used directly if it is a Python function (types.FunctionType).
  • Callable class: Instantiated (by calling it with no arguments) if it is not a raw function, producing a callable instance.

The encode method delegates to the resolved transform function when the input data is not already a NumPy array. The transform function is expected to handle its own batching. All results are cast to np.float32.

The loadmodel method returns None since no internal model is needed; the external transform function handles all embedding logic.

Usage

Use External vectors when you want to plug in your own embedding function, a remote API, or any custom vectorization logic that does not fit into txtai's built-in model backends. This is the most flexible vectors backend and is appropriate for integrating proprietary models, custom preprocessing pipelines, or third-party embedding services.

Code Reference

Source Location

  • Repository: Neuml_Txtai
  • File: src/python/txtai/vectors/dense/external.py

Signature

class External(Vectors):
    def __init__(self, config, scoring, models)
    def loadmodel(self, path) -> None
    def encode(self, data, category=None) -> ndarray
    def resolve(self, transform) -> callable

Import

from txtai.vectors.dense.external import External

I/O Contract

Inputs

Name Type Required Description
config dict Yes Configuration dictionary. Must include transform (str, callable, or class) specifying the external embedding function. Also supports all base Vectors config keys.
scoring Scoring No Optional scoring instance for token weighting.
models object No Shared models cache instance.
data list Yes (encode) List of input text strings or pre-computed NumPy arrays. If already ndarray elements, the transform function is skipped.
category str No Optional category hint (e.g., "query" or "data"). Passed through but not used by the default implementation.

Outputs

Name Type Description
embeddings ndarray (float32) NumPy array of embedding vectors with shape (n, dimensions).
model None loadmodel always returns None since the external transform handles embedding generation.

Usage Examples

from txtai.embeddings import Embeddings

# Define a custom transform function
def my_embeddings(texts):
    import numpy as np
    # Example: return random embeddings (replace with real logic)
    return np.random.rand(len(texts), 384).astype(np.float32)

# Use External vectors with a callable transform
embeddings = Embeddings({
    "path": "external",
    "transform": my_embeddings,
    "dimensions": 384
})

# Or use a module path string
embeddings = Embeddings({
    "path": "external",
    "transform": "mypackage.embeddings.CustomEncoder",
    "dimensions": 768
})

# Index and search
embeddings.index([
    (0, "natural language processing", None),
    (1, "computer vision models", None),
])

results = embeddings.search("NLP techniques", limit=5)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment