Implementation:Neuml Txtai External Vectors
| Knowledge Sources | |
|---|---|
| Domains | Embeddings, Vectors, Custom Integration |
| Last Updated | 2026-02-10 01:00 GMT |
Overview
Concrete tool for building embedding vectors using an external transform method provided by txtai.
Description
The External class extends the base Vectors class to support building embedding vectors via a user-supplied external function or API call. Rather than loading a specific model, it resolves a configurable transform function that handles the actual embedding generation.
The transform function resolution supports multiple input types:
- String: Resolved via txtai's Resolver utility, which can import and instantiate classes or functions from module paths.
- Function: Used directly if it is a Python function (
types.FunctionType). - Callable class: Instantiated (by calling it with no arguments) if it is not a raw function, producing a callable instance.
The encode method delegates to the resolved transform function when the input data is not already a NumPy array. The transform function is expected to handle its own batching. All results are cast to np.float32.
The loadmodel method returns None since no internal model is needed; the external transform function handles all embedding logic.
Usage
Use External vectors when you want to plug in your own embedding function, a remote API, or any custom vectorization logic that does not fit into txtai's built-in model backends. This is the most flexible vectors backend and is appropriate for integrating proprietary models, custom preprocessing pipelines, or third-party embedding services.
Code Reference
Source Location
- Repository: Neuml_Txtai
- File:
src/python/txtai/vectors/dense/external.py
Signature
class External(Vectors):
def __init__(self, config, scoring, models)
def loadmodel(self, path) -> None
def encode(self, data, category=None) -> ndarray
def resolve(self, transform) -> callable
Import
from txtai.vectors.dense.external import External
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | dict | Yes | Configuration dictionary. Must include transform (str, callable, or class) specifying the external embedding function. Also supports all base Vectors config keys. |
| scoring | Scoring | No | Optional scoring instance for token weighting. |
| models | object | No | Shared models cache instance. |
| data | list | Yes (encode) | List of input text strings or pre-computed NumPy arrays. If already ndarray elements, the transform function is skipped. |
| category | str | No | Optional category hint (e.g., "query" or "data"). Passed through but not used by the default implementation. |
Outputs
| Name | Type | Description |
|---|---|---|
| embeddings | ndarray (float32) | NumPy array of embedding vectors with shape (n, dimensions). |
| model | None | loadmodel always returns None since the external transform handles embedding generation. |
Usage Examples
from txtai.embeddings import Embeddings
# Define a custom transform function
def my_embeddings(texts):
import numpy as np
# Example: return random embeddings (replace with real logic)
return np.random.rand(len(texts), 384).astype(np.float32)
# Use External vectors with a callable transform
embeddings = Embeddings({
"path": "external",
"transform": my_embeddings,
"dimensions": 384
})
# Or use a module path string
embeddings = Embeddings({
"path": "external",
"transform": "mypackage.embeddings.CustomEncoder",
"dimensions": 768
})
# Index and search
embeddings.index([
(0, "natural language processing", None),
(1, "computer vision models", None),
])
results = embeddings.search("NLP techniques", limit=5)