Implementation:Bentoml BentoML Runnable Base

Knowledge Sources	Bentoml_BentoML
Domains	Runner, ML Inference, Core Framework
Last Updated	2026-02-13 15:00 GMT

Overview

Defines the Runnable base class and supporting types for declaring computation units (runners) in BentoML, including method configuration for batching and streaming.

Description

The Runnable class is the abstract base that all BentoML runner implementations must extend. It provides a method static method (usable as a decorator) for registering methods on the runnable that can be invoked remotely. Each registered method is wrapped as a RunnableMethod descriptor, which stores the callable and its RunnableMethodConfig (batchable, batch_dim, input_spec, output_spec, is_stream). The class enforces that SUPPORTED_RESOURCES and SUPPORTS_CPU_MULTI_THREADING cannot be modified at runtime (via __setattr__) and prevents runtime access to add_method and method on instances (via __getattribute__). RunnableMethod uses the descriptor protocol (__get__, __set_name__) to bind methods to the runnable class and register them in the bentoml_runnable_methods__ dictionary. It supports both synchronous and asynchronous callables transparently.

Usage

Use this class to define custom BentoML runners. Subclass Runnable, declare SUPPORTED_RESOURCES and SUPPORTS_CPU_MULTI_THREADING, then use the @Runnable.method decorator to register inference methods.

Code Reference

Source Location

Repository: Bentoml_BentoML
File: src/bentoml/_internal/runner/runnable.py
Lines: 1-172

Signature

class Runnable:
    SUPPORTED_RESOURCES: tuple[str, ...]
    SUPPORTS_CPU_MULTI_THREADING: bool
    bentoml_runnable_methods__: dict[str, RunnableMethod[t.Any, t.Any, t.Any]] | None = None

    @classmethod
    def add_method(
        cls: t.Type[T],
        method: t.Callable[t.Concatenate[T, P], t.Any],
        name: str,
        *,
        batchable: bool = False,
        batch_dim: tuple[int, int] | int = 0,
        input_spec: LazyType[t.Any] | t.Tuple[LazyType[t.Any], ...] | None = None,
        output_spec: LazyType[t.Any] | None = None,
    ): ...

    @staticmethod
    def method(
        meth: t.Callable[...] | None = None,
        *,
        batchable: bool = False,
        batch_dim: tuple[int, int] | int = 0,
        input_spec: AnyType | tuple[AnyType, ...] | None = None,
        output_spec: AnyType | None = None,
    ) -> RunnableMethod | t.Callable[..., RunnableMethod]: ...

@attr.define
class RunnableMethod(t.Generic[T, P, R]):
    func: t.Callable[t.Concatenate[T, P], R]
    config: RunnableMethodConfig

@attr.define
class RunnableMethodConfig:
    batchable: bool
    batch_dim: tuple[int, int]
    input_spec: AnyType | tuple[AnyType, ...] | None = None
    output_spec: AnyType | None = None
    is_stream: bool = False

Import

from bentoml._internal.runner.runnable import Runnable, RunnableMethod, RunnableMethodConfig

I/O Contract

Inputs

Name	Type	Required	Description
meth	Callable or None	No	The method to register; if None, returns a decorator
batchable	bool	No	Whether this method supports adaptive batching; defaults to False
batch_dim	tuple[int, int] or int	No	Input and output batch dimensions; defaults to 0
input_spec	AnyType or tuple or None	No	Expected input type specification
output_spec	AnyType or None	No	Expected output type specification

Outputs

Name	Type	Description
RunnableMethod	RunnableMethod[T, P, R]	A descriptor wrapping the method with its configuration

Usage Examples

import bentoml
import numpy as np
from bentoml._internal.runner.runnable import Runnable

class MyModelRunnable(Runnable):
    SUPPORTED_RESOURCES = ("nvidia.com/gpu", "cpu")
    SUPPORTS_CPU_MULTI_THREADING = True

    def __init__(self):
        self.model = ...  # load model

    @Runnable.method(batchable=True, batch_dim=0)
    def predict(self, input_data: np.ndarray) -> np.ndarray:
        return self.model.predict(input_data)

    @Runnable.method(batchable=False)
    def explain(self, input_data: np.ndarray) -> dict:
        return self.model.explain(input_data)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment