Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Bentoml BentoML Framework ONNX

From Leeroopedia
Knowledge Sources
Domains ML Framework, Model Interoperability, Model Serialization
Last Updated 2026-02-13 15:00 GMT

Overview

The bentoml.onnx module provides BentoML integration for ONNX (Open Neural Network Exchange) models, enabling save, load, and serving via ONNX Runtime inference sessions.

Description

This module implements the BentoML framework adapter for ONNX models. Models are saved as .onnx protobuf files and loaded using onnxruntime.InferenceSession for high-performance inference.

ModelOptions (attrs dataclass) includes:

  • input_specs: Dictionary mapping method names to input specifications (extracted from the ONNX graph).
  • output_specs: Dictionary mapping method names to output specifications.
  • providers: ONNX Runtime execution providers (e.g., CUDAExecutionProvider, CPUExecutionProvider).
  • session_options: ONNX Runtime session configuration.

Key implementation details:

  • save_model(): Saves an onnx.ModelProto to the model store. Automatically extracts input/output specifications from the ONNX graph. Only the run method name is allowed in signatures since ONNX Runtime uses InferenceSession.run().
  • load_model(): Creates an ort.InferenceSession with configurable providers. Defaults to CPUExecutionProvider if no providers are specified.
  • get_runnable(): Creates an ONNXRunnable that handles GPU/CPU provider selection, CPU parallelization configuration (intra/inter op thread counts), automatic input type casting via generated casting functions, and output tuple handling for multi-output models. Includes backward compatibility for v1 API models.

Usage

Use this module to save and serve ONNX-format models (exported from PyTorch, TensorFlow, scikit-learn, etc.) within BentoML services. Ideal for cross-framework model deployment.

Code Reference

Source Location

Signature

def get(tag_like: str | Tag) -> bentoml.Model: ...

def load_model(bento_model: str | Tag | bentoml.Model,
               *, providers: ProvidersType | None = None,
               session_options: ort.SessionOptions | None = None
               ) -> ort.InferenceSession: ...

def save_model(name: Tag | str,
               model: onnx.ModelProto,
               *, signatures: dict | None = None,
               labels: dict[str, str] | None = None,
               custom_objects: dict[str, Any] | None = None,
               external_modules: List[ModuleType] | None = None,
               metadata: dict[str, Any] | None = None
               ) -> bentoml.Model: ...

def get_runnable(bento_model: bentoml.Model) -> type[bentoml.legacy.Runnable]: ...

Import

import bentoml

# Via public API
model = bentoml.onnx.save_model(...)
session = bentoml.onnx.load_model(...)

I/O Contract

Inputs

save_model()

Name Type Required Description
name Tag or str Yes Name/tag for the model in the BentoML store
model onnx.ModelProto Yes The ONNX model protobuf to save
signatures dict or None No Inference method signatures (default: {"run": {"batchable": False}}). Only "run" is allowed.
labels dict[str, str] or None No User-defined labels for model management
custom_objects dict[str, Any] or None No Additional objects to serialize
external_modules List[ModuleType] or None No Additional Python modules to save alongside
metadata dict[str, Any] or None No Custom metadata for the model

load_model()

Name Type Required Description
bento_model str, Tag, or Model Yes Tag or Model instance to load from the store
providers ProvidersType or None No ONNX Runtime execution providers (default: ["CPUExecutionProvider"])
session_options ort.SessionOptions or None No ONNX Runtime session configuration

Outputs

Method Return Type Description
save_model() bentoml.Model A BentoML Model containing the saved ONNX model
load_model() ort.InferenceSession An ONNX Runtime inference session
get() bentoml.Model The BentoML Model reference from the store
get_runnable() type[Runnable] An ONNXRunnable class with automatic input casting and provider selection

Usage Examples

import bentoml
import torch
import torch.nn as nn
import onnx

# Export a PyTorch model to ONNX
class SimpleModel(nn.Module):
    def __init__(self, D_in, H, D_out):
        super().__init__()
        self.linear1 = nn.Linear(D_in, H)
        self.linear2 = nn.Linear(H, D_out)

    def forward(self, x):
        return self.linear2(self.linear1(x).clamp(min=0))

model = SimpleModel(1000, 100, 1)
x = torch.randn(64, 1000)
torch.onnx.export(model, x, "/tmp/model.onnx",
                  input_names=["x"], output_names=["output"])

# Save ONNX model to BentoML
onnx_model = onnx.load("/tmp/model.onnx")
bento_model = bentoml.onnx.save_model(
    "onnx_model", onnx_model,
    signatures={"run": {"batchable": True}}
)

# Load and run inference
sess = bentoml.onnx.load_model("onnx_model:latest")
import numpy as np
result = sess.run(None, {"x": np.random.randn(1, 1000).astype(np.float32)})

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment