Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Neuml Txtai M2V Vectors

From Leeroopedia


Knowledge Sources
Domains Embeddings, Vectors, Static Embeddings
Last Updated 2026-02-10 01:00 GMT

Overview

Concrete tool for building embedding vectors using Model2Vec static models provided by txtai.

Description

The Model2Vec class extends the base Vectors class to generate embeddings using the Model2Vec library, which provides extremely fast, lightweight static embedding models. Model2Vec models distill the knowledge of larger transformer models into compact static representations that require no GPU and load instantly.

Key features:

  • Model detection: The static ismodel method downloads and checks the model's config.json from the Hugging Face Hub, returning True if model_type is "model2vec". It handles invalid repo IDs and missing files gracefully via HFValidationError and OSError.
  • Simple loading: The loadmodel method calls StaticModel.from_pretrained(path) to load the model.
  • Configurable encoding: The encode method passes the encodebatch size and any additional keyword arguments from the vectors config key to the model's encode method.

Model2Vec models are typically orders of magnitude faster than transformer-based models while maintaining competitive quality for many tasks.

Usage

Use Model2Vec vectors when you need ultra-fast embedding generation with minimal resource requirements. This is ideal for high-throughput applications, CPU-only deployments, real-time systems where latency is critical, or edge/IoT device deployments with limited resources.

Code Reference

Source Location

  • Repository: Neuml_Txtai
  • File: src/python/txtai/vectors/dense/m2v.py

Signature

class Model2Vec(Vectors):
    @staticmethod
    def ismodel(path) -> bool
    def __init__(self, config, scoring, models)
    def loadmodel(self, path) -> StaticModel
    def encode(self, data, category=None) -> ndarray

Import

from txtai.vectors.dense.m2v import Model2Vec

I/O Contract

Inputs

Name Type Required Description
config dict Yes Configuration dictionary. Must include path (str, HF Hub model ID or local path to a Model2Vec model). Optional keys: vectors (dict of additional encoding keyword arguments), encodebatch (int, batch size for encoding).
scoring Scoring No Optional scoring instance for token weighting.
models object No Shared models cache instance.
data list[str] Yes (encode) List of text strings to generate embeddings for.
category str No Optional category hint (not used).
path (ismodel) str Yes (ismodel) HF Hub model identifier to check for Model2Vec model type.

Outputs

Name Type Description
embeddings ndarray NumPy array of embedding vectors produced by the Model2Vec model.
ismodel bool True if the model's config.json has model_type == "model2vec".
model StaticModel Loaded Model2Vec static model instance.

Usage Examples

from txtai.embeddings import Embeddings

# Use a Model2Vec model from HF Hub
embeddings = Embeddings({
    "path": "minishlab/M2V_base_output"
})

# Index documents
embeddings.index([
    (0, "fast static embeddings for search", None),
    (1, "lightweight models for production", None),
    (2, "transformer distillation techniques", None),
])

# Search - extremely fast due to static model
results = embeddings.search("fast lightweight search", limit=5)

# Use with additional encoding parameters
embeddings = Embeddings({
    "path": "minishlab/M2V_base_output",
    "vectors": {"normalize": True},
    "encodebatch": 256
})

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment