Implementation:Neuml Txtai HFOnnx Call
| Knowledge Sources | |
|---|---|
| Domains | Training, NLP |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for exporting trained transformer models to ONNX format for optimized cross-platform inference provided by the txtai library.
Description
HFOnnx.__call__() exports a HuggingFace transformer model to ONNX format. It supports multiple task types, each producing a different output schema, and includes optional post-export quantization via ONNX Runtime.
The method performs the following steps:
- Resolves I/O parameters -- calls
self.parameters(task)to obtain the input schema, output schema, and model loader function for the specified task. - Loads model and tokenizer -- accepts either a
(model, tokenizer)tuple (e.g., directly from a training run) or a string path (HuggingFace hub ID or local directory). When a tuple is provided, the model is moved to CPU for export. - Generates dummy inputs -- tokenizes a test string to create concrete tensors for tracing.
- Exports via
torch.onnx.export()-- traces the model's forward pass and writes the ONNX graph. Constant folding is enabled, dynamic axes are declared for variable batch/sequence dimensions, and the ONNX opset version is configurable. - Optional quantization -- if
quantize=True, the exported model is quantized usingonnxruntime.quantization.quantize_dynamic(). For in-memory exports, a temporary file is used. - Returns result -- if no output path was specified, the ONNX model is returned as raw bytes. Otherwise, the file path is returned.
Supported task-to-output mappings:
"default"-- output:last_hidden_state(all token embeddings), loaded viaAutoModel"pooling"-- output:embeddings(pooled sentence embedding), loaded viaPoolingOnnx"question-answering"-- output:start_logits,end_logits, loaded viaAutoModelForQuestionAnswering"text-classification"-- output:logits, loaded viaAutoModelForSequenceClassification"zero-shot-classification"-- alias for"text-classification"
Usage
This method is used after training to convert a PyTorch model into ONNX format for production inference. It integrates seamlessly with the output of HFTrainer.__call__(), which returns a (model, tokenizer) tuple that can be passed directly.
Code Reference
Source Location
- Repository: txtai
- File:
src/python/txtai/pipeline/train/hfonnx.py(Lines 32-87)
Signature
def __call__(self, path, task="default", output=None, quantize=False, opset=14):
"""
Exports a Hugging Face Transformer model to ONNX.
Args:
path: path to model, accepts HF hub id, local path or (model, tokenizer) tuple
task: model task or category, determines outputs, defaults to "default"
output: optional output model path, defaults to return bytes if None
quantize: if model should be quantized, defaults to False
opset: onnx opset version, defaults to 14
Returns:
path to model output or model as bytes depending on output parameter
"""
Import
from txtai.pipeline import HFOnnx
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| path | str or tuple | Yes | Path to pretrained model (HuggingFace hub ID or local directory), or a (model, tokenizer) tuple. When a tuple is provided, the model is moved to CPU before export.
|
| task | str | No | Task type that determines the output schema and model class. Supported values: "default" (default), "pooling", "question-answering", "text-classification", "zero-shot-classification".
|
| output | str or None | No | File path to write the ONNX model. When None (default), the model is returned as raw bytes.
|
| quantize | bool | No | Whether to apply ONNX dynamic quantization after export. Default: False. Requires onnxruntime to be installed.
|
| opset | int | No | ONNX opset version. Default: 14. Higher values enable more operators but may reduce runtime compatibility.
|
Outputs
| Name | Type | Description |
|---|---|---|
| result | bytes or str | When output=None: the ONNX model as raw bytes. When output is a file path: the path string to the saved ONNX file. The output tensor names depend on the task: "default" produces last_hidden_state, "pooling" produces embeddings, "question-answering" produces start_logits and end_logits, "text-classification" and "zero-shot-classification" produce logits.
|
Usage Examples
Basic Example: Export to Bytes
from txtai.pipeline import HFOnnx
exporter = HFOnnx()
# Export a pretrained model to ONNX bytes
onnx_bytes = exporter("bert-base-uncased", task="default")
print(type(onnx_bytes)) # <class 'bytes'>
print(len(onnx_bytes)) # Size of the ONNX model in bytes
Export to File
from txtai.pipeline import HFOnnx
exporter = HFOnnx()
# Export and save to a file
output_path = exporter(
"bert-base-uncased",
task="text-classification",
output="./model.onnx"
)
print(output_path) # "./model.onnx"
Export from Training Result
from txtai.pipeline import HFTrainer, HFOnnx
# Train a model
trainer = HFTrainer()
model, tokenizer = trainer(
"bert-base-uncased",
train=[{"text": "Great!", "label": 1}, {"text": "Bad.", "label": 0}],
task="text-classification",
num_train_epochs=1,
)
# Export the trained model directly
exporter = HFOnnx()
onnx_bytes = exporter(
(model, tokenizer),
task="text-classification"
)
Export with Quantization
from txtai.pipeline import HFOnnx
exporter = HFOnnx()
# Export and quantize for optimized CPU inference
onnx_bytes = exporter(
"bert-base-uncased",
task="pooling",
quantize=True,
opset=14
)
Question Answering Export
from txtai.pipeline import HFOnnx
exporter = HFOnnx()
# Export a QA model -- produces start_logits and end_logits outputs
output_path = exporter(
"distilbert-base-cased-distilled-squad",
task="question-answering",
output="./qa-model.onnx"
)