Implementation:Bentoml BentoML Runner Class
| Knowledge Sources | |
|---|---|
| Domains | Runner, Model Serving, Distributed Computing |
| Last Updated | 2026-02-13 15:00 GMT |
Overview
The Runner module defines the Runner and AbstractRunner classes that represent units of computation in BentoML's legacy service architecture, supporting remote execution, dynamic batching, and scheduling strategies.
Description
This module provides the runner abstraction for BentoML's legacy (pre-2.0) service architecture. Key components include:
RunnerMethod[T, P, R] (attrs frozen class): Wraps individual methods of a Runnable class, providing:
run(*args, **kwargs): Synchronous execution via the runner handle.async_run(*args, **kwargs): Asynchronous execution.async_stream(*args, **kwargs): Async streaming execution returningAsyncGenerator[str, None].- Configuration for batching:
max_batch_sizeandmax_latency_ms.
AbstractRunner (attrs frozen ABC): Base class defining the runner interface:
name: Validated and lowercased runner name (must be a valid BentoML Tag).models: List ofModelinstances.resource_config: Resource allocation configuration.runnable_class: TheRunnablesubclass this runner executes.- Abstract methods:
init_local()andinit_client().
Runner (extends AbstractRunner, deprecated): The concrete runner implementation:
- Initialization: Reads runner configuration from
BentoMLContainer.config.runners, inspects all methods on theRunnableclass, and createsRunnerMethodinstances with batching configuration from both method-level and runner-level settings. - Default method selection: If only one method exists, or a method named
__call__exists, it becomes the default accessible viarunner.run()andrunner.async_run(). - Handle lifecycle: Uses
_set_handle()to install aRunnerHandleimplementation.init_local()installsLocalRunnerReffor debugging;init_client()installsRemoteRunnerClientfor production.destroy()resets toDummyRunnerHandle. - Readiness probe:
runner_handle_is_ready()checks if the runner handle is ready, used as a Kubernetes readiness probe. - Worker scheduling:
scheduled_worker_countandscheduled_worker_env_mapproperties delegate to the configuredStrategyclass.
The module is marked as deprecated with the suggestion to upgrade to new-style services (@bentoml.service()).
Usage
Used in BentoML's legacy service architecture to define remote computation units. Runners are created from Runnable classes (typically obtained via bentoml.{framework}.get(...).to_runner()) and passed to a bentoml.legacy.Service.
Code Reference
Source Location
- Repository: Bentoml_BentoML
- File: src/bentoml/_internal/runner/runner.py
- Lines: 1-363
Signature
@attr.frozen(slots=False)
class RunnerMethod(t.Generic[T, P, R]):
runner: Runner | TritonRunner
name: str
config: RunnableMethodConfig
max_batch_size: int
max_latency_ms: int
doc: str | None = None
def run(self, *args: P.args, **kwargs: P.kwargs) -> R: ...
async def async_run(self, *args: P.args, **kwargs: P.kwargs) -> R: ...
def async_stream(self, *args: P.args, **kwargs: P.kwargs) -> t.AsyncGenerator[str, None]: ...
@attr.define(slots=False, frozen=True)
class AbstractRunner(ABC):
name: str
models: list[Model]
resource_config: dict[str, t.Any]
runnable_class: type[Runnable]
embedded: bool
@attr.define(slots=False, frozen=True, eq=False, init=False)
class Runner(AbstractRunner):
def __init__(
self,
runnable_class: type[Runnable],
*,
runnable_init_params: dict[str, t.Any] | None = None,
name: str | None = None,
scheduling_strategy: type[Strategy] = DefaultStrategy,
models: list[Model] | None = None,
max_batch_size: int | None = None,
max_latency_ms: int | None = None,
method_configs: dict[str, dict[str, int]] | None = None,
embedded: bool = False,
) -> None: ...
def init_local(self, quiet: bool = False) -> None: ...
def init_client(self, handle_class: type[RunnerHandle] | None = None, *args, **kwargs): ...
def destroy(self): ...
async def runner_handle_is_ready(self, timeout: int = ...) -> bool: ...
Import
from bentoml._internal.runner.runner import Runner
from bentoml._internal.runner.runner import RunnerMethod
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| runnable_class | type[Runnable] | Yes | The Runnable subclass that defines the computation logic and methods. |
| name | str or None | No | Runner name. If None, derived from the runnable class name (lowercased). |
| scheduling_strategy | type[Strategy] | No | Strategy class for worker scheduling. Defaults to DefaultStrategy. |
| models | list[Model] or None | No | List of BentoML Model instances required by this runner. |
| max_batch_size | int or None | No | Global max batch size for dynamic batching. Overridden by method-level config. |
| max_latency_ms | int or None | No | Global max latency for dynamic batching. Overridden by method-level config. |
| method_configs | dict[str, dict[str, int]] or None | No | Per-method configuration for max_batch_size and max_latency_ms. |
| embedded | bool | No | Whether the runner runs in-process. Defaults to False. |
Outputs
| Name | Type | Description |
|---|---|---|
| Runner | Runner | A configured Runner instance with methods accessible as attributes. |
Usage Examples
import bentoml
# Create a runner from a saved model
runner = bentoml.sklearn.get("iris_clf:latest").to_runner()
# Use in a legacy service
svc = bentoml.legacy.Service("iris-classifier", runners=[runner])
@svc.api(
input=bentoml.io.NumpyNdarray(),
output=bentoml.io.NumpyNdarray(),
)
def predict(input_arr):
return runner.run(input_arr)
# For testing/debugging, initialize locally
runner.init_local(quiet=True)
result = runner.run([[5, 4, 3, 2]])
runner.destroy()