Implementation:Neuml Txtai HFModel
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, NLP, Transformers |
| Last Updated | 2026-02-10 01:00 GMT |
Overview
Concrete tool for wrapping Hugging Face Transformers models with tensor management, tokenization, and quantization support provided by txtai.
Description
HFModel is a base pipeline class backed by a Hugging Face Transformers model. It extends the Tensors base class to provide device management (CPU/GPU), model quantization, and intelligent tokenization that handles overflowing tokens by splitting them into separate chunks. This class serves as a foundation for downstream pipelines such as CrossEncoder and LateEncoder that need direct model access rather than the higher-level Transformers pipeline API.
Usage
Use HFModel when you need a lower-level wrapper around a Hugging Face model that provides direct control over tokenization, batching, and device placement. It is the preferred base class for custom pipelines that require token-level manipulation or cannot use the standard Transformers pipeline abstraction.
Code Reference
Source Location
- Repository: Neuml_Txtai
- File:
src/python/txtai/pipeline/hfmodel.py
Signature
class HFModel(Tensors):
def __init__(self, path=None, quantize=False, gpu=False, batch=64)
def prepare(self, model)
def tokenize(self, tokenizer, texts)
Import
from txtai.pipeline.hfmodel import HFModel
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| path | str | No | Path to model; accepts a Hugging Face model hub id or local path. Uses default model for task if not provided. |
| quantize | bool | No | If True, applies dynamic quantization to the model (CPU only). Defaults to False. |
| gpu | bool or int | No | True/False to enable GPU, or a specific GPU device id. Defaults to False. |
| batch | int | No | Batch size used to incrementally process content. Defaults to 64. |
Outputs
| Name | Type | Description |
|---|---|---|
| (from prepare) | model | The prepared (optionally quantized) Hugging Face model. |
| (from tokenize) | tuple(dict, list) | A tuple of tokenized tensors (input_ids, attention_mask) moved to the target device and a list of indices for reconstructing original text positions. |
Usage Examples
from txtai.pipeline.hfmodel import HFModel
# Create a model wrapper with GPU enabled and quantization off
model = HFModel(path="distilbert-base-uncased", quantize=False, gpu=True, batch=32)
# Prepare a loaded Hugging Face model for inference
from transformers import AutoModel
raw_model = AutoModel.from_pretrained("distilbert-base-uncased")
prepared = model.prepare(raw_model)
# Tokenize a batch of texts
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
tokens, indices = model.tokenize(tokenizer, ["Hello world", "txtai is great"])