Implementation:Neuml Txtai Embeddings Init

Knowledge Sources	txtai txtai Documentation
Domains	Semantic_Search, NLP
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for configuring an embeddings database instance for semantic search provided by the txtai library.

Description

The Embeddings.__init__ method creates a new embeddings index instance and sets up all internal components based on the provided configuration. It initializes slots for the ANN index, document database, scoring engine, graph, subindexes, dimensionality reducer, vector model, and query model. The configuration dictionary and any keyword arguments are merged into a single config dict, which is then passed to self.configure() to load config-driven models (dense vector model, scoring backend, query model). The resulting instance is thread-safe for read operations but writes must be externally synchronized.

Usage

Use this constructor when initializing a new semantic search pipeline. Pass a configuration dictionary specifying the embedding model path, content storage flag, scoring method, and any other desired settings. For shared-model deployments, pass a models cache dictionary to avoid loading duplicate copies of the same model in memory.

Code Reference

Source Location

Repository: txtai
File: src/python/txtai/embeddings/base.py
Lines: L30-83

Signature

def __init__(self, config=None, models=None, **kwargs):

Import

from txtai.embeddings import Embeddings

I/O Contract

Inputs

Name	Type	Required	Description
config	dict or None	No	Embeddings configuration dictionary. Common keys include "path" (str, model name or path, e.g. "sentence-transformers/all-MiniLM-L6-v2"), "content" (bool, enable document content storage), "scoring" (str or dict, sparse scoring method such as "bm25"), "hybrid" (bool or str, enable hybrid search), "graph" (dict or bool, enable graph index), "indexes" (dict, subindex configurations). Defaults to None (empty config).
models	dict or None	No	Shared models cache dictionary. When multiple Embeddings instances use the same vector model, passing a shared dict avoids loading the model multiple times. Defaults to None.
**kwargs	keyword arguments	No	Additional configuration keys merged into config. If both config and kwargs are provided, kwargs take precedence on conflicting keys.

Outputs

Name	Type	Description
(instance)	Embeddings	An initialized Embeddings instance. Key attributes after init: self.config (merged configuration dict or None), self.model (dense vector model or None), self.ann (None, created during indexing), self.database (None, created during indexing), self.scoring (scoring instance if configured for word vector weighting), self.graph (None, created during indexing), self.indexes (None, created during indexing), self.ids (None, created during indexing), self.reducer (None, created during indexing), self.models (shared models cache).

Usage Examples

Basic Example

from txtai.embeddings import Embeddings

# Create embeddings with a sentence-transformers model and content storage
embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True
})

# Verify the configuration
print(embeddings.config["path"])
# Output: sentence-transformers/all-MiniLM-L6-v2
print(embeddings.config["content"])
# Output: True

Advanced Example: Hybrid Search Configuration

from txtai.embeddings import Embeddings

# Configure hybrid search with BM25 sparse scoring
embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True,
    "hybrid": True,
    "scoring": {
        "method": "bm25",
        "terms": True,
        "normalize": True
    }
})

Advanced Example: Shared Model Cache

from txtai.embeddings import Embeddings

# Shared models cache to avoid loading the same model multiple times
models = {}

embeddings_a = Embeddings({"path": "sentence-transformers/all-MiniLM-L6-v2"}, models=models)
embeddings_b = Embeddings({"path": "sentence-transformers/all-MiniLM-L6-v2"}, models=models)

# Both instances share the same underlying vector model

Advanced Example: Using kwargs

from txtai.embeddings import Embeddings

# Config via kwargs - these are merged into the config dict
embeddings = Embeddings(path="sentence-transformers/all-MiniLM-L6-v2", content=True)

print(embeddings.config["path"])
# Output: sentence-transformers/all-MiniLM-L6-v2

Related Pages

Implements Principle

Principle:Neuml_Txtai_Embeddings_Configuration

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment