Implementation:Neuml Txtai M2V Vectors

Knowledge Sources	Neuml_Txtai
Domains	Embeddings, Vectors, Static Embeddings
Last Updated	2026-02-10 01:00 GMT

Overview

Concrete tool for building embedding vectors using Model2Vec static models provided by txtai.

Description

The Model2Vec class extends the base Vectors class to generate embeddings using the Model2Vec library, which provides extremely fast, lightweight static embedding models. Model2Vec models distill the knowledge of larger transformer models into compact static representations that require no GPU and load instantly.

Key features:

Model detection: The static ismodel method downloads and checks the model's config.json from the Hugging Face Hub, returning True if model_type is "model2vec". It handles invalid repo IDs and missing files gracefully via HFValidationError and OSError.
Simple loading: The loadmodel method calls StaticModel.from_pretrained(path) to load the model.
Configurable encoding: The encode method passes the encodebatch size and any additional keyword arguments from the vectors config key to the model's encode method.

Model2Vec models are typically orders of magnitude faster than transformer-based models while maintaining competitive quality for many tasks.

Usage

Use Model2Vec vectors when you need ultra-fast embedding generation with minimal resource requirements. This is ideal for high-throughput applications, CPU-only deployments, real-time systems where latency is critical, or edge/IoT device deployments with limited resources.

Code Reference

Source Location

Repository: Neuml_Txtai
File: src/python/txtai/vectors/dense/m2v.py

Signature

class Model2Vec(Vectors):
    @staticmethod
    def ismodel(path) -> bool
    def __init__(self, config, scoring, models)
    def loadmodel(self, path) -> StaticModel
    def encode(self, data, category=None) -> ndarray

Import

from txtai.vectors.dense.m2v import Model2Vec

I/O Contract

Inputs

Name	Type	Required	Description
config	dict	Yes	Configuration dictionary. Must include path (str, HF Hub model ID or local path to a Model2Vec model). Optional keys: vectors (dict of additional encoding keyword arguments), encodebatch (int, batch size for encoding).
scoring	Scoring	No	Optional scoring instance for token weighting.
models	object	No	Shared models cache instance.
data	list[str]	Yes (encode)	List of text strings to generate embeddings for.
category	str	No	Optional category hint (not used).
path (ismodel)	str	Yes (ismodel)	HF Hub model identifier to check for Model2Vec model type.

Outputs

Name	Type	Description
embeddings	ndarray	NumPy array of embedding vectors produced by the Model2Vec model.
ismodel	bool	True if the model's config.json has `model_type == "model2vec"`.
model	StaticModel	Loaded Model2Vec static model instance.

Usage Examples

from txtai.embeddings import Embeddings

# Use a Model2Vec model from HF Hub
embeddings = Embeddings({
    "path": "minishlab/M2V_base_output"
})

# Index documents
embeddings.index([
    (0, "fast static embeddings for search", None),
    (1, "lightweight models for production", None),
    (2, "transformer distillation techniques", None),
])

# Search - extremely fast due to static model
results = embeddings.search("fast lightweight search", limit=5)

# Use with additional encoding parameters
embeddings = Embeddings({
    "path": "minishlab/M2V_base_output",
    "vectors": {"normalize": True},
    "encodebatch": 256
})

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment