Implementation:Bentoml BentoML Framework ONNX

Knowledge Sources	Bentoml_BentoML
Domains	ML Framework, Model Interoperability, Model Serialization
Last Updated	2026-02-13 15:00 GMT

Overview

The bentoml.onnx module provides BentoML integration for ONNX (Open Neural Network Exchange) models, enabling save, load, and serving via ONNX Runtime inference sessions.

Description

This module implements the BentoML framework adapter for ONNX models. Models are saved as .onnx protobuf files and loaded using onnxruntime.InferenceSession for high-performance inference.

ModelOptions (attrs dataclass) includes:

input_specs: Dictionary mapping method names to input specifications (extracted from the ONNX graph).
output_specs: Dictionary mapping method names to output specifications.
providers: ONNX Runtime execution providers (e.g., CUDAExecutionProvider, CPUExecutionProvider).
session_options: ONNX Runtime session configuration.

Key implementation details:

save_model(): Saves an onnx.ModelProto to the model store. Automatically extracts input/output specifications from the ONNX graph. Only the run method name is allowed in signatures since ONNX Runtime uses InferenceSession.run().
load_model(): Creates an ort.InferenceSession with configurable providers. Defaults to CPUExecutionProvider if no providers are specified.
get_runnable(): Creates an ONNXRunnable that handles GPU/CPU provider selection, CPU parallelization configuration (intra/inter op thread counts), automatic input type casting via generated casting functions, and output tuple handling for multi-output models. Includes backward compatibility for v1 API models.

Usage

Use this module to save and serve ONNX-format models (exported from PyTorch, TensorFlow, scikit-learn, etc.) within BentoML services. Ideal for cross-framework model deployment.

Code Reference

Source Location

Repository: Bentoml_BentoML
File: src/bentoml/_internal/frameworks/onnx.py
Lines: 1-444

Signature

def get(tag_like: str | Tag) -> bentoml.Model: ...

def load_model(bento_model: str | Tag | bentoml.Model,
               *, providers: ProvidersType | None = None,
               session_options: ort.SessionOptions | None = None
               ) -> ort.InferenceSession: ...

def save_model(name: Tag | str,
               model: onnx.ModelProto,
               *, signatures: dict | None = None,
               labels: dict[str, str] | None = None,
               custom_objects: dict[str, Any] | None = None,
               external_modules: List[ModuleType] | None = None,
               metadata: dict[str, Any] | None = None
               ) -> bentoml.Model: ...

def get_runnable(bento_model: bentoml.Model) -> type[bentoml.legacy.Runnable]: ...

Import

import bentoml

# Via public API
model = bentoml.onnx.save_model(...)
session = bentoml.onnx.load_model(...)

I/O Contract

Inputs

save_model()

Name	Type	Required	Description
name	Tag or str	Yes	Name/tag for the model in the BentoML store
model	onnx.ModelProto	Yes	The ONNX model protobuf to save
signatures	dict or None	No	Inference method signatures (default: {"run": {"batchable": False}}). Only "run" is allowed.
labels	dict[str, str] or None	No	User-defined labels for model management
custom_objects	dict[str, Any] or None	No	Additional objects to serialize
external_modules	List[ModuleType] or None	No	Additional Python modules to save alongside
metadata	dict[str, Any] or None	No	Custom metadata for the model

load_model()

Name	Type	Required	Description
bento_model	str, Tag, or Model	Yes	Tag or Model instance to load from the store
providers	ProvidersType or None	No	ONNX Runtime execution providers (default: ["CPUExecutionProvider"])
session_options	ort.SessionOptions or None	No	ONNX Runtime session configuration

Outputs

Method	Return Type	Description
save_model()	bentoml.Model	A BentoML Model containing the saved ONNX model
load_model()	ort.InferenceSession	An ONNX Runtime inference session
get()	bentoml.Model	The BentoML Model reference from the store
get_runnable()	type[Runnable]	An ONNXRunnable class with automatic input casting and provider selection

Usage Examples

import bentoml
import torch
import torch.nn as nn
import onnx

# Export a PyTorch model to ONNX
class SimpleModel(nn.Module):
    def __init__(self, D_in, H, D_out):
        super().__init__()
        self.linear1 = nn.Linear(D_in, H)
        self.linear2 = nn.Linear(H, D_out)

    def forward(self, x):
        return self.linear2(self.linear1(x).clamp(min=0))

model = SimpleModel(1000, 100, 1)
x = torch.randn(64, 1000)
torch.onnx.export(model, x, "/tmp/model.onnx",
                  input_names=["x"], output_names=["output"])

# Save ONNX model to BentoML
onnx_model = onnx.load("/tmp/model.onnx")
bento_model = bentoml.onnx.save_model(
    "onnx_model", onnx_model,
    signatures={"run": {"batchable": True}}
)

# Load and run inference
sess = bentoml.onnx.load_model("onnx_model:latest")
import numpy as np
result = sess.run(None, {"x": np.random.randn(1, 1000).astype(np.float32)})

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment