Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Neuml Txtai AutoId

From Leeroopedia


Knowledge Sources
Domains Embeddings, Index Management, ID Generation
Last Updated 2026-02-10 01:00 GMT

Overview

Concrete tool for generating unique document identifiers within an embeddings index provided by txtai.

Description

The AutoId class generates unique identifiers for documents during indexing. It supports two generation strategies:

  • Integer sequence (default): Generates monotonically increasing integer IDs starting from 0 (or a specified starting value). The sequence method returns the current value and increments the counter.
  • UUID generation: Generates universally unique identifiers using Python's uuid module. The specific UUID function is specified by name (e.g., "uuid1", "uuid3", "uuid4", "uuid5"). Deterministic UUID functions (uuid3, uuid5) accept a data parameter and use uuid.NAMESPACE_DNS as the namespace, enabling reproducible IDs for the same input data.

The class inspects the UUID function's signature to determine if it is deterministic (accepts a "namespace" argument). When called, it delegates to the appropriate method (sequence or uuid) and returns the generated ID.

Usage

Use AutoId when you need automatic ID generation during embeddings indexing. The integer sequence mode is the default and most common. UUID mode is useful when you need globally unique identifiers or deterministic IDs based on document content (via uuid3 or uuid5).

Code Reference

Source Location

  • Repository: Neuml_Txtai
  • File: src/python/txtai/embeddings/index/autoid.py

Signature

class AutoId:
    def __init__(self, method=None)
    def __call__(self, data=None) -> int or str
    def sequence(self, data) -> int
    def uuid(self, data) -> str
    def current(self) -> int or None

Import

from txtai.embeddings.index.autoid import AutoId

I/O Contract

Inputs

Name Type Required Description
method int or str No ID generation method. If None or int, uses integer sequence (int value sets starting offset). If str, specifies a UUID function name from Python's uuid module (e.g., "uuid1", "uuid3", "uuid4", "uuid5").
data any No Optional data for deterministic UUID generation (uuid3, uuid5). Converted to string and combined with uuid.NAMESPACE_DNS. Ignored for sequence and non-deterministic UUID methods.

Outputs

Name Type Description
id int or str Generated unique identifier. Integer for sequence mode, UUID string for UUID mode.
current int or None Current sequence value (only for sequence mode; None for UUID mode).

Usage Examples

from txtai.embeddings.index.autoid import AutoId

# Default integer sequence starting at 0
autoid = AutoId()
print(autoid())  # 0
print(autoid())  # 1
print(autoid())  # 2
print(autoid.current())  # 3

# Integer sequence starting at 100
autoid = AutoId(100)
print(autoid())  # 100
print(autoid())  # 101

# Random UUID (uuid4)
autoid = AutoId("uuid4")
print(autoid())  # e.g., "a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d"

# Deterministic UUID based on content (uuid5)
autoid = AutoId("uuid5")
id1 = autoid("document content A")
id2 = autoid("document content A")
print(id1 == id2)  # True - same input produces same UUID

id3 = autoid("document content B")
print(id1 == id3)  # False - different input produces different UUID

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment