Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Neuml Txtai HFModel

From Leeroopedia


Knowledge Sources
Domains Machine Learning, NLP, Transformers
Last Updated 2026-02-10 01:00 GMT

Overview

Concrete tool for wrapping Hugging Face Transformers models with tensor management, tokenization, and quantization support provided by txtai.

Description

HFModel is a base pipeline class backed by a Hugging Face Transformers model. It extends the Tensors base class to provide device management (CPU/GPU), model quantization, and intelligent tokenization that handles overflowing tokens by splitting them into separate chunks. This class serves as a foundation for downstream pipelines such as CrossEncoder and LateEncoder that need direct model access rather than the higher-level Transformers pipeline API.

Usage

Use HFModel when you need a lower-level wrapper around a Hugging Face model that provides direct control over tokenization, batching, and device placement. It is the preferred base class for custom pipelines that require token-level manipulation or cannot use the standard Transformers pipeline abstraction.

Code Reference

Source Location

  • Repository: Neuml_Txtai
  • File: src/python/txtai/pipeline/hfmodel.py

Signature

class HFModel(Tensors):
    def __init__(self, path=None, quantize=False, gpu=False, batch=64)
    def prepare(self, model)
    def tokenize(self, tokenizer, texts)

Import

from txtai.pipeline.hfmodel import HFModel

I/O Contract

Inputs

Name Type Required Description
path str No Path to model; accepts a Hugging Face model hub id or local path. Uses default model for task if not provided.
quantize bool No If True, applies dynamic quantization to the model (CPU only). Defaults to False.
gpu bool or int No True/False to enable GPU, or a specific GPU device id. Defaults to False.
batch int No Batch size used to incrementally process content. Defaults to 64.

Outputs

Name Type Description
(from prepare) model The prepared (optionally quantized) Hugging Face model.
(from tokenize) tuple(dict, list) A tuple of tokenized tensors (input_ids, attention_mask) moved to the target device and a list of indices for reconstructing original text positions.

Usage Examples

from txtai.pipeline.hfmodel import HFModel

# Create a model wrapper with GPU enabled and quantization off
model = HFModel(path="distilbert-base-uncased", quantize=False, gpu=True, batch=32)

# Prepare a loaded Hugging Face model for inference
from transformers import AutoModel
raw_model = AutoModel.from_pretrained("distilbert-base-uncased")
prepared = model.prepare(raw_model)

# Tokenize a batch of texts
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
tokens, indices = model.tokenize(tokenizer, ["Hello world", "txtai is great"])

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment