Implementation:Neuml Txtai HFOnnx Call

Knowledge Sources	txtai txtai Documentation
Domains	Training, NLP
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for exporting trained transformer models to ONNX format for optimized cross-platform inference provided by the txtai library.

Description

HFOnnx.__call__() exports a HuggingFace transformer model to ONNX format. It supports multiple task types, each producing a different output schema, and includes optional post-export quantization via ONNX Runtime.

The method performs the following steps:

Resolves I/O parameters -- calls self.parameters(task) to obtain the input schema, output schema, and model loader function for the specified task.
Loads model and tokenizer -- accepts either a (model, tokenizer) tuple (e.g., directly from a training run) or a string path (HuggingFace hub ID or local directory). When a tuple is provided, the model is moved to CPU for export.
Generates dummy inputs -- tokenizes a test string to create concrete tensors for tracing.
Exports via torch.onnx.export() -- traces the model's forward pass and writes the ONNX graph. Constant folding is enabled, dynamic axes are declared for variable batch/sequence dimensions, and the ONNX opset version is configurable.
Optional quantization -- if quantize=True, the exported model is quantized using onnxruntime.quantization.quantize_dynamic(). For in-memory exports, a temporary file is used.
Returns result -- if no output path was specified, the ONNX model is returned as raw bytes. Otherwise, the file path is returned.

Supported task-to-output mappings:

"default" -- output: last_hidden_state (all token embeddings), loaded via AutoModel
"pooling" -- output: embeddings (pooled sentence embedding), loaded via PoolingOnnx
"question-answering" -- output: start_logits, end_logits, loaded via AutoModelForQuestionAnswering
"text-classification" -- output: logits, loaded via AutoModelForSequenceClassification
"zero-shot-classification" -- alias for "text-classification"

Usage

This method is used after training to convert a PyTorch model into ONNX format for production inference. It integrates seamlessly with the output of HFTrainer.__call__(), which returns a (model, tokenizer) tuple that can be passed directly.

Code Reference

Source Location

Repository: txtai
File: src/python/txtai/pipeline/train/hfonnx.py (Lines 32-87)

Signature

def __call__(self, path, task="default", output=None, quantize=False, opset=14):
    """
    Exports a Hugging Face Transformer model to ONNX.

    Args:
        path: path to model, accepts HF hub id, local path or (model, tokenizer) tuple
        task: model task or category, determines outputs, defaults to "default"
        output: optional output model path, defaults to return bytes if None
        quantize: if model should be quantized, defaults to False
        opset: onnx opset version, defaults to 14

    Returns:
        path to model output or model as bytes depending on output parameter
    """

Import

from txtai.pipeline import HFOnnx

I/O Contract

Inputs

Name	Type	Required	Description
path	str or tuple	Yes	Path to pretrained model (HuggingFace hub ID or local directory), or a `(model, tokenizer)` tuple. When a tuple is provided, the model is moved to CPU before export.
task	str	No	Task type that determines the output schema and model class. Supported values: `"default"` (default), `"pooling"`, `"question-answering"`, `"text-classification"`, `"zero-shot-classification"`.
output	str or None	No	File path to write the ONNX model. When `None` (default), the model is returned as raw bytes.
quantize	bool	No	Whether to apply ONNX dynamic quantization after export. Default: `False`. Requires `onnxruntime` to be installed.
opset	int	No	ONNX opset version. Default: `14`. Higher values enable more operators but may reduce runtime compatibility.

Outputs

Name	Type	Description
result	bytes or str	When `output=None`: the ONNX model as raw bytes. When `output` is a file path: the path string to the saved ONNX file. The output tensor names depend on the task: `"default"` produces `last_hidden_state`, `"pooling"` produces `embeddings`, `"question-answering"` produces `start_logits` and `end_logits`, `"text-classification"` and `"zero-shot-classification"` produce `logits`.

Usage Examples

Basic Example: Export to Bytes

from txtai.pipeline import HFOnnx

exporter = HFOnnx()

# Export a pretrained model to ONNX bytes
onnx_bytes = exporter("bert-base-uncased", task="default")
print(type(onnx_bytes))  # <class 'bytes'>
print(len(onnx_bytes))   # Size of the ONNX model in bytes

Export to File

from txtai.pipeline import HFOnnx

exporter = HFOnnx()

# Export and save to a file
output_path = exporter(
    "bert-base-uncased",
    task="text-classification",
    output="./model.onnx"
)
print(output_path)  # "./model.onnx"

Export from Training Result

from txtai.pipeline import HFTrainer, HFOnnx

# Train a model
trainer = HFTrainer()
model, tokenizer = trainer(
    "bert-base-uncased",
    train=[{"text": "Great!", "label": 1}, {"text": "Bad.", "label": 0}],
    task="text-classification",
    num_train_epochs=1,
)

# Export the trained model directly
exporter = HFOnnx()
onnx_bytes = exporter(
    (model, tokenizer),
    task="text-classification"
)

Export with Quantization

from txtai.pipeline import HFOnnx

exporter = HFOnnx()

# Export and quantize for optimized CPU inference
onnx_bytes = exporter(
    "bert-base-uncased",
    task="pooling",
    quantize=True,
    opset=14
)

Question Answering Export

from txtai.pipeline import HFOnnx

exporter = HFOnnx()

# Export a QA model -- produces start_logits and end_logits outputs
output_path = exporter(
    "distilbert-base-cased-distilled-squad",
    task="question-answering",
    output="./qa-model.onnx"
)

Related Pages

Implements Principle

Principle:Neuml_Txtai_ONNX_Export

Requires Environment

Environment:Neuml_Txtai_Python_Core_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment