Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Neuml Txtai Embeddings Init

From Leeroopedia


Knowledge Sources
Domains Semantic_Search, NLP
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for configuring an embeddings database instance for semantic search provided by the txtai library.

Description

The Embeddings.__init__ method creates a new embeddings index instance and sets up all internal components based on the provided configuration. It initializes slots for the ANN index, document database, scoring engine, graph, subindexes, dimensionality reducer, vector model, and query model. The configuration dictionary and any keyword arguments are merged into a single config dict, which is then passed to self.configure() to load config-driven models (dense vector model, scoring backend, query model). The resulting instance is thread-safe for read operations but writes must be externally synchronized.

Usage

Use this constructor when initializing a new semantic search pipeline. Pass a configuration dictionary specifying the embedding model path, content storage flag, scoring method, and any other desired settings. For shared-model deployments, pass a models cache dictionary to avoid loading duplicate copies of the same model in memory.

Code Reference

Source Location

  • Repository: txtai
  • File: src/python/txtai/embeddings/base.py
  • Lines: L30-83

Signature

def __init__(self, config=None, models=None, **kwargs):

Import

from txtai.embeddings import Embeddings

I/O Contract

Inputs

Name Type Required Description
config dict or None No Embeddings configuration dictionary. Common keys include "path" (str, model name or path, e.g. "sentence-transformers/all-MiniLM-L6-v2"), "content" (bool, enable document content storage), "scoring" (str or dict, sparse scoring method such as "bm25"), "hybrid" (bool or str, enable hybrid search), "graph" (dict or bool, enable graph index), "indexes" (dict, subindex configurations). Defaults to None (empty config).
models dict or None No Shared models cache dictionary. When multiple Embeddings instances use the same vector model, passing a shared dict avoids loading the model multiple times. Defaults to None.
**kwargs keyword arguments No Additional configuration keys merged into config. If both config and kwargs are provided, kwargs take precedence on conflicting keys.

Outputs

Name Type Description
(instance) Embeddings An initialized Embeddings instance. Key attributes after init: self.config (merged configuration dict or None), self.model (dense vector model or None), self.ann (None, created during indexing), self.database (None, created during indexing), self.scoring (scoring instance if configured for word vector weighting), self.graph (None, created during indexing), self.indexes (None, created during indexing), self.ids (None, created during indexing), self.reducer (None, created during indexing), self.models (shared models cache).

Usage Examples

Basic Example

from txtai.embeddings import Embeddings

# Create embeddings with a sentence-transformers model and content storage
embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True
})

# Verify the configuration
print(embeddings.config["path"])
# Output: sentence-transformers/all-MiniLM-L6-v2
print(embeddings.config["content"])
# Output: True

Advanced Example: Hybrid Search Configuration

from txtai.embeddings import Embeddings

# Configure hybrid search with BM25 sparse scoring
embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True,
    "hybrid": True,
    "scoring": {
        "method": "bm25",
        "terms": True,
        "normalize": True
    }
})

Advanced Example: Shared Model Cache

from txtai.embeddings import Embeddings

# Shared models cache to avoid loading the same model multiple times
models = {}

embeddings_a = Embeddings({"path": "sentence-transformers/all-MiniLM-L6-v2"}, models=models)
embeddings_b = Embeddings({"path": "sentence-transformers/all-MiniLM-L6-v2"}, models=models)

# Both instances share the same underlying vector model

Advanced Example: Using kwargs

from txtai.embeddings import Embeddings

# Config via kwargs - these are merged into the config dict
embeddings = Embeddings(path="sentence-transformers/all-MiniLM-L6-v2", content=True)

print(embeddings.config["path"])
# Output: sentence-transformers/all-MiniLM-L6-v2

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment