Implementation:Neuml Txtai Embeddings Init
| Knowledge Sources | |
|---|---|
| Domains | Semantic_Search, NLP |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for configuring an embeddings database instance for semantic search provided by the txtai library.
Description
The Embeddings.__init__ method creates a new embeddings index instance and sets up all internal components based on the provided configuration. It initializes slots for the ANN index, document database, scoring engine, graph, subindexes, dimensionality reducer, vector model, and query model. The configuration dictionary and any keyword arguments are merged into a single config dict, which is then passed to self.configure() to load config-driven models (dense vector model, scoring backend, query model). The resulting instance is thread-safe for read operations but writes must be externally synchronized.
Usage
Use this constructor when initializing a new semantic search pipeline. Pass a configuration dictionary specifying the embedding model path, content storage flag, scoring method, and any other desired settings. For shared-model deployments, pass a models cache dictionary to avoid loading duplicate copies of the same model in memory.
Code Reference
Source Location
- Repository: txtai
- File:
src/python/txtai/embeddings/base.py - Lines: L30-83
Signature
def __init__(self, config=None, models=None, **kwargs):
Import
from txtai.embeddings import Embeddings
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | dict or None | No | Embeddings configuration dictionary. Common keys include "path" (str, model name or path, e.g. "sentence-transformers/all-MiniLM-L6-v2"), "content" (bool, enable document content storage), "scoring" (str or dict, sparse scoring method such as "bm25"), "hybrid" (bool or str, enable hybrid search), "graph" (dict or bool, enable graph index), "indexes" (dict, subindex configurations). Defaults to None (empty config). |
| models | dict or None | No | Shared models cache dictionary. When multiple Embeddings instances use the same vector model, passing a shared dict avoids loading the model multiple times. Defaults to None. |
| **kwargs | keyword arguments | No | Additional configuration keys merged into config. If both config and kwargs are provided, kwargs take precedence on conflicting keys. |
Outputs
| Name | Type | Description |
|---|---|---|
| (instance) | Embeddings | An initialized Embeddings instance. Key attributes after init: self.config (merged configuration dict or None), self.model (dense vector model or None), self.ann (None, created during indexing), self.database (None, created during indexing), self.scoring (scoring instance if configured for word vector weighting), self.graph (None, created during indexing), self.indexes (None, created during indexing), self.ids (None, created during indexing), self.reducer (None, created during indexing), self.models (shared models cache). |
Usage Examples
Basic Example
from txtai.embeddings import Embeddings
# Create embeddings with a sentence-transformers model and content storage
embeddings = Embeddings({
"path": "sentence-transformers/all-MiniLM-L6-v2",
"content": True
})
# Verify the configuration
print(embeddings.config["path"])
# Output: sentence-transformers/all-MiniLM-L6-v2
print(embeddings.config["content"])
# Output: True
Advanced Example: Hybrid Search Configuration
from txtai.embeddings import Embeddings
# Configure hybrid search with BM25 sparse scoring
embeddings = Embeddings({
"path": "sentence-transformers/all-MiniLM-L6-v2",
"content": True,
"hybrid": True,
"scoring": {
"method": "bm25",
"terms": True,
"normalize": True
}
})
from txtai.embeddings import Embeddings
# Shared models cache to avoid loading the same model multiple times
models = {}
embeddings_a = Embeddings({"path": "sentence-transformers/all-MiniLM-L6-v2"}, models=models)
embeddings_b = Embeddings({"path": "sentence-transformers/all-MiniLM-L6-v2"}, models=models)
# Both instances share the same underlying vector model
Advanced Example: Using kwargs
from txtai.embeddings import Embeddings
# Config via kwargs - these are merged into the config dict
embeddings = Embeddings(path="sentence-transformers/all-MiniLM-L6-v2", content=True)
print(embeddings.config["path"])
# Output: sentence-transformers/all-MiniLM-L6-v2