Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Neuml Txtai ONNX Model

From Leeroopedia


Knowledge Sources
Domains ONNX Runtime, Model Inference, Transformer Models
Last Updated 2026-02-10 01:00 GMT

Overview

Concrete tool for wrapping ONNX models with a Transformers-compatible interface provided by txtai.

Description

The OnnxModel class provides a Hugging Face Transformers and PyTorch-compatible interface for ONNX models running on the ONNX Runtime. It extends PreTrainedModel to seamlessly integrate ONNX inference into pipelines that expect standard transformer model interfaces. The class handles: (1) creating an ONNX Runtime InferenceSession with automatic provider selection (CUDA when available, falling back to CPU), (2) parsing and casting inputs from PyTorch tensors to NumPy arrays (selecting only input_ids, attention_mask, and token_type_ids), (3) running inference, and (4) converting outputs back to PyTorch tensors. When the model produces logits output, it returns a SequenceClassifierOutput for compatibility with classification pipelines. The class also registers itself with the txtai model registry to enable AutoModel-style loading. A companion OnnxConfig class provides a minimal PretrainedConfig when no configuration file is available.

Usage

Use OnnxModel when you need to run ONNX-exported transformer models within txtai's embedding and pipeline infrastructure. It is the standard wrapper for ONNX models in txtai, enabling GPU-accelerated inference via ONNX Runtime while maintaining compatibility with the Hugging Face Transformers API. Typical scenarios include running quantized or optimized models exported to ONNX format for faster inference.

Code Reference

Source Location

  • Repository: Neuml_Txtai
  • File: src/python/txtai/models/onnx.py

Signature

class OnnxModel(PreTrainedModel):
    def __init__(self, model, config=None)

    @property
    def device(self)

    def providers(self)
    def forward(self, **inputs)
    def parse(self, inputs)

class OnnxConfig(PretrainedConfig):
    pass

Import

from txtai.models.onnx import OnnxModel

I/O Contract

Inputs

Name Type Required Description
model str or InferenceSession Yes Path to an ONNX model file, or a pre-created ONNX Runtime InferenceSession.
config str No Path to a Hugging Face model configuration directory. If not provided, a minimal OnnxConfig is used.
inputs (forward) dict Yes Model inputs as keyword arguments. Supports input_ids (token ids), attention_mask (attention mask), and token_type_ids (segment indices). Values can be PyTorch tensors or NumPy arrays.

Outputs

Name Type Description
device int Always returns -1, indicating CPU-managed ONNX Runtime execution (provider handles actual device placement).
providers() list List of ONNX Runtime execution providers in priority order. Returns ["CUDAExecutionProvider", "CPUExecutionProvider"] when CUDA is available, otherwise ["CPUExecutionProvider"].
forward() torch.Tensor or SequenceClassifierOutput If the model outputs "logits", returns a SequenceClassifierOutput with the logits tensor. Otherwise returns the raw output as a PyTorch tensor.
parse() dict Dictionary of NumPy arrays with keys limited to input_ids, attention_mask, and token_type_ids (only those present in the input).

Usage Examples

from txtai.models.onnx import OnnxModel

# Load an ONNX model with configuration
model = OnnxModel(
    model="/path/to/model.onnx",
    config="/path/to/model/config"
)

# Run inference with PyTorch-style inputs
import torch
inputs = {
    "input_ids": torch.tensor([[101, 2054, 2003, 3032, 102]]),
    "attention_mask": torch.tensor([[1, 1, 1, 1, 1]])
}

outputs = model.forward(**inputs)
# outputs: torch.Tensor with model predictions

# Check available providers
providers = model.providers()
# providers: ["CPUExecutionProvider"] or ["CUDAExecutionProvider", "CPUExecutionProvider"]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment