Implementation:Bentoml BentoML Runner Class

Knowledge Sources	Bentoml_BentoML
Domains	Runner, Model Serving, Distributed Computing
Last Updated	2026-02-13 15:00 GMT

Overview

The Runner module defines the Runner and AbstractRunner classes that represent units of computation in BentoML's legacy service architecture, supporting remote execution, dynamic batching, and scheduling strategies.

Description

This module provides the runner abstraction for BentoML's legacy (pre-2.0) service architecture. Key components include:

RunnerMethod[T, P, R] (attrs frozen class): Wraps individual methods of a Runnable class, providing:

run(*args, **kwargs): Synchronous execution via the runner handle.
async_run(*args, **kwargs): Asynchronous execution.
async_stream(*args, **kwargs): Async streaming execution returning AsyncGenerator[str, None].
Configuration for batching: max_batch_size and max_latency_ms.

AbstractRunner (attrs frozen ABC): Base class defining the runner interface:

name: Validated and lowercased runner name (must be a valid BentoML Tag).
models: List of Model instances.
resource_config: Resource allocation configuration.
runnable_class: The Runnable subclass this runner executes.
Abstract methods: init_local() and init_client().

Runner (extends AbstractRunner, deprecated): The concrete runner implementation:

Initialization: Reads runner configuration from BentoMLContainer.config.runners, inspects all methods on the Runnable class, and creates RunnerMethod instances with batching configuration from both method-level and runner-level settings.
Default method selection: If only one method exists, or a method named __call__ exists, it becomes the default accessible via runner.run() and runner.async_run().
Handle lifecycle: Uses _set_handle() to install a RunnerHandle implementation. init_local() installs LocalRunnerRef for debugging; init_client() installs RemoteRunnerClient for production. destroy() resets to DummyRunnerHandle.
Readiness probe: runner_handle_is_ready() checks if the runner handle is ready, used as a Kubernetes readiness probe.
Worker scheduling: scheduled_worker_count and scheduled_worker_env_map properties delegate to the configured Strategy class.

The module is marked as deprecated with the suggestion to upgrade to new-style services (@bentoml.service()).

Usage

Used in BentoML's legacy service architecture to define remote computation units. Runners are created from Runnable classes (typically obtained via bentoml.{framework}.get(...).to_runner()) and passed to a bentoml.legacy.Service.

Code Reference

Source Location

Repository: Bentoml_BentoML
File: src/bentoml/_internal/runner/runner.py
Lines: 1-363

Signature

@attr.frozen(slots=False)
class RunnerMethod(t.Generic[T, P, R]):
    runner: Runner | TritonRunner
    name: str
    config: RunnableMethodConfig
    max_batch_size: int
    max_latency_ms: int
    doc: str | None = None

    def run(self, *args: P.args, **kwargs: P.kwargs) -> R: ...
    async def async_run(self, *args: P.args, **kwargs: P.kwargs) -> R: ...
    def async_stream(self, *args: P.args, **kwargs: P.kwargs) -> t.AsyncGenerator[str, None]: ...


@attr.define(slots=False, frozen=True)
class AbstractRunner(ABC):
    name: str
    models: list[Model]
    resource_config: dict[str, t.Any]
    runnable_class: type[Runnable]
    embedded: bool


@attr.define(slots=False, frozen=True, eq=False, init=False)
class Runner(AbstractRunner):
    def __init__(
        self,
        runnable_class: type[Runnable],
        *,
        runnable_init_params: dict[str, t.Any] | None = None,
        name: str | None = None,
        scheduling_strategy: type[Strategy] = DefaultStrategy,
        models: list[Model] | None = None,
        max_batch_size: int | None = None,
        max_latency_ms: int | None = None,
        method_configs: dict[str, dict[str, int]] | None = None,
        embedded: bool = False,
    ) -> None: ...

    def init_local(self, quiet: bool = False) -> None: ...
    def init_client(self, handle_class: type[RunnerHandle] | None = None, *args, **kwargs): ...
    def destroy(self): ...
    async def runner_handle_is_ready(self, timeout: int = ...) -> bool: ...

Import

from bentoml._internal.runner.runner import Runner
from bentoml._internal.runner.runner import RunnerMethod

I/O Contract

Inputs

Name	Type	Required	Description
runnable_class	type[Runnable]	Yes	The Runnable subclass that defines the computation logic and methods.
name	str or None	No	Runner name. If None, derived from the runnable class name (lowercased).
scheduling_strategy	type[Strategy]	No	Strategy class for worker scheduling. Defaults to DefaultStrategy.
models	list[Model] or None	No	List of BentoML Model instances required by this runner.
max_batch_size	int or None	No	Global max batch size for dynamic batching. Overridden by method-level config.
max_latency_ms	int or None	No	Global max latency for dynamic batching. Overridden by method-level config.
method_configs	dict[str, dict[str, int]] or None	No	Per-method configuration for max_batch_size and max_latency_ms.
embedded	bool	No	Whether the runner runs in-process. Defaults to False.

Outputs

Name	Type	Description
Runner	Runner	A configured Runner instance with methods accessible as attributes.

Usage Examples

import bentoml

# Create a runner from a saved model
runner = bentoml.sklearn.get("iris_clf:latest").to_runner()

# Use in a legacy service
svc = bentoml.legacy.Service("iris-classifier", runners=[runner])

@svc.api(
    input=bentoml.io.NumpyNdarray(),
    output=bentoml.io.NumpyNdarray(),
)
def predict(input_arr):
    return runner.run(input_arr)


# For testing/debugging, initialize locally
runner.init_local(quiet=True)
result = runner.run([[5, 4, 3, 2]])
runner.destroy()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment