Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Neuml Txtai HFOnnx Call

From Leeroopedia


Knowledge Sources
Domains Training, NLP
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for exporting trained transformer models to ONNX format for optimized cross-platform inference provided by the txtai library.

Description

HFOnnx.__call__() exports a HuggingFace transformer model to ONNX format. It supports multiple task types, each producing a different output schema, and includes optional post-export quantization via ONNX Runtime.

The method performs the following steps:

  1. Resolves I/O parameters -- calls self.parameters(task) to obtain the input schema, output schema, and model loader function for the specified task.
  2. Loads model and tokenizer -- accepts either a (model, tokenizer) tuple (e.g., directly from a training run) or a string path (HuggingFace hub ID or local directory). When a tuple is provided, the model is moved to CPU for export.
  3. Generates dummy inputs -- tokenizes a test string to create concrete tensors for tracing.
  4. Exports via torch.onnx.export() -- traces the model's forward pass and writes the ONNX graph. Constant folding is enabled, dynamic axes are declared for variable batch/sequence dimensions, and the ONNX opset version is configurable.
  5. Optional quantization -- if quantize=True, the exported model is quantized using onnxruntime.quantization.quantize_dynamic(). For in-memory exports, a temporary file is used.
  6. Returns result -- if no output path was specified, the ONNX model is returned as raw bytes. Otherwise, the file path is returned.

Supported task-to-output mappings:

  • "default" -- output: last_hidden_state (all token embeddings), loaded via AutoModel
  • "pooling" -- output: embeddings (pooled sentence embedding), loaded via PoolingOnnx
  • "question-answering" -- output: start_logits, end_logits, loaded via AutoModelForQuestionAnswering
  • "text-classification" -- output: logits, loaded via AutoModelForSequenceClassification
  • "zero-shot-classification" -- alias for "text-classification"

Usage

This method is used after training to convert a PyTorch model into ONNX format for production inference. It integrates seamlessly with the output of HFTrainer.__call__(), which returns a (model, tokenizer) tuple that can be passed directly.

Code Reference

Source Location

  • Repository: txtai
  • File: src/python/txtai/pipeline/train/hfonnx.py (Lines 32-87)

Signature

def __call__(self, path, task="default", output=None, quantize=False, opset=14):
    """
    Exports a Hugging Face Transformer model to ONNX.

    Args:
        path: path to model, accepts HF hub id, local path or (model, tokenizer) tuple
        task: model task or category, determines outputs, defaults to "default"
        output: optional output model path, defaults to return bytes if None
        quantize: if model should be quantized, defaults to False
        opset: onnx opset version, defaults to 14

    Returns:
        path to model output or model as bytes depending on output parameter
    """

Import

from txtai.pipeline import HFOnnx

I/O Contract

Inputs

Name Type Required Description
path str or tuple Yes Path to pretrained model (HuggingFace hub ID or local directory), or a (model, tokenizer) tuple. When a tuple is provided, the model is moved to CPU before export.
task str No Task type that determines the output schema and model class. Supported values: "default" (default), "pooling", "question-answering", "text-classification", "zero-shot-classification".
output str or None No File path to write the ONNX model. When None (default), the model is returned as raw bytes.
quantize bool No Whether to apply ONNX dynamic quantization after export. Default: False. Requires onnxruntime to be installed.
opset int No ONNX opset version. Default: 14. Higher values enable more operators but may reduce runtime compatibility.

Outputs

Name Type Description
result bytes or str When output=None: the ONNX model as raw bytes. When output is a file path: the path string to the saved ONNX file. The output tensor names depend on the task: "default" produces last_hidden_state, "pooling" produces embeddings, "question-answering" produces start_logits and end_logits, "text-classification" and "zero-shot-classification" produce logits.

Usage Examples

Basic Example: Export to Bytes

from txtai.pipeline import HFOnnx

exporter = HFOnnx()

# Export a pretrained model to ONNX bytes
onnx_bytes = exporter("bert-base-uncased", task="default")
print(type(onnx_bytes))  # <class 'bytes'>
print(len(onnx_bytes))   # Size of the ONNX model in bytes

Export to File

from txtai.pipeline import HFOnnx

exporter = HFOnnx()

# Export and save to a file
output_path = exporter(
    "bert-base-uncased",
    task="text-classification",
    output="./model.onnx"
)
print(output_path)  # "./model.onnx"

Export from Training Result

from txtai.pipeline import HFTrainer, HFOnnx

# Train a model
trainer = HFTrainer()
model, tokenizer = trainer(
    "bert-base-uncased",
    train=[{"text": "Great!", "label": 1}, {"text": "Bad.", "label": 0}],
    task="text-classification",
    num_train_epochs=1,
)

# Export the trained model directly
exporter = HFOnnx()
onnx_bytes = exporter(
    (model, tokenizer),
    task="text-classification"
)

Export with Quantization

from txtai.pipeline import HFOnnx

exporter = HFOnnx()

# Export and quantize for optimized CPU inference
onnx_bytes = exporter(
    "bert-base-uncased",
    task="pooling",
    quantize=True,
    opset=14
)

Question Answering Export

from txtai.pipeline import HFOnnx

exporter = HFOnnx()

# Export a QA model -- produces start_logits and end_logits outputs
output_path = exporter(
    "distilbert-base-cased-distilled-squad",
    task="question-answering",
    output="./qa-model.onnx"
)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment