Implementation:Neuml Txtai AutoId
| Knowledge Sources | |
|---|---|
| Domains | Embeddings, Index Management, ID Generation |
| Last Updated | 2026-02-10 01:00 GMT |
Overview
Concrete tool for generating unique document identifiers within an embeddings index provided by txtai.
Description
The AutoId class generates unique identifiers for documents during indexing. It supports two generation strategies:
- Integer sequence (default): Generates monotonically increasing integer IDs starting from 0 (or a specified starting value). The sequence method returns the current value and increments the counter.
- UUID generation: Generates universally unique identifiers using Python's uuid module. The specific UUID function is specified by name (e.g., "uuid1", "uuid3", "uuid4", "uuid5"). Deterministic UUID functions (uuid3, uuid5) accept a data parameter and use
uuid.NAMESPACE_DNSas the namespace, enabling reproducible IDs for the same input data.
The class inspects the UUID function's signature to determine if it is deterministic (accepts a "namespace" argument). When called, it delegates to the appropriate method (sequence or uuid) and returns the generated ID.
Usage
Use AutoId when you need automatic ID generation during embeddings indexing. The integer sequence mode is the default and most common. UUID mode is useful when you need globally unique identifiers or deterministic IDs based on document content (via uuid3 or uuid5).
Code Reference
Source Location
- Repository: Neuml_Txtai
- File:
src/python/txtai/embeddings/index/autoid.py
Signature
class AutoId:
def __init__(self, method=None)
def __call__(self, data=None) -> int or str
def sequence(self, data) -> int
def uuid(self, data) -> str
def current(self) -> int or None
Import
from txtai.embeddings.index.autoid import AutoId
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| method | int or str | No | ID generation method. If None or int, uses integer sequence (int value sets starting offset). If str, specifies a UUID function name from Python's uuid module (e.g., "uuid1", "uuid3", "uuid4", "uuid5"). |
| data | any | No | Optional data for deterministic UUID generation (uuid3, uuid5). Converted to string and combined with uuid.NAMESPACE_DNS. Ignored for sequence and non-deterministic UUID methods. |
Outputs
| Name | Type | Description |
|---|---|---|
| id | int or str | Generated unique identifier. Integer for sequence mode, UUID string for UUID mode. |
| current | int or None | Current sequence value (only for sequence mode; None for UUID mode). |
Usage Examples
from txtai.embeddings.index.autoid import AutoId
# Default integer sequence starting at 0
autoid = AutoId()
print(autoid()) # 0
print(autoid()) # 1
print(autoid()) # 2
print(autoid.current()) # 3
# Integer sequence starting at 100
autoid = AutoId(100)
print(autoid()) # 100
print(autoid()) # 101
# Random UUID (uuid4)
autoid = AutoId("uuid4")
print(autoid()) # e.g., "a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d"
# Deterministic UUID based on content (uuid5)
autoid = AutoId("uuid5")
id1 = autoid("document content A")
id2 = autoid("document content A")
print(id1 == id2) # True - same input produces same UUID
id3 = autoid("document content B")
print(id1 == id3) # False - different input produces different UUID