Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Bentoml BentoML Runner Class

From Leeroopedia
Knowledge Sources
Domains Runner, Model Serving, Distributed Computing
Last Updated 2026-02-13 15:00 GMT

Overview

The Runner module defines the Runner and AbstractRunner classes that represent units of computation in BentoML's legacy service architecture, supporting remote execution, dynamic batching, and scheduling strategies.

Description

This module provides the runner abstraction for BentoML's legacy (pre-2.0) service architecture. Key components include:

RunnerMethod[T, P, R] (attrs frozen class): Wraps individual methods of a Runnable class, providing:

  • run(*args, **kwargs): Synchronous execution via the runner handle.
  • async_run(*args, **kwargs): Asynchronous execution.
  • async_stream(*args, **kwargs): Async streaming execution returning AsyncGenerator[str, None].
  • Configuration for batching: max_batch_size and max_latency_ms.

AbstractRunner (attrs frozen ABC): Base class defining the runner interface:

  • name: Validated and lowercased runner name (must be a valid BentoML Tag).
  • models: List of Model instances.
  • resource_config: Resource allocation configuration.
  • runnable_class: The Runnable subclass this runner executes.
  • Abstract methods: init_local() and init_client().

Runner (extends AbstractRunner, deprecated): The concrete runner implementation:

  • Initialization: Reads runner configuration from BentoMLContainer.config.runners, inspects all methods on the Runnable class, and creates RunnerMethod instances with batching configuration from both method-level and runner-level settings.
  • Default method selection: If only one method exists, or a method named __call__ exists, it becomes the default accessible via runner.run() and runner.async_run().
  • Handle lifecycle: Uses _set_handle() to install a RunnerHandle implementation. init_local() installs LocalRunnerRef for debugging; init_client() installs RemoteRunnerClient for production. destroy() resets to DummyRunnerHandle.
  • Readiness probe: runner_handle_is_ready() checks if the runner handle is ready, used as a Kubernetes readiness probe.
  • Worker scheduling: scheduled_worker_count and scheduled_worker_env_map properties delegate to the configured Strategy class.

The module is marked as deprecated with the suggestion to upgrade to new-style services (@bentoml.service()).

Usage

Used in BentoML's legacy service architecture to define remote computation units. Runners are created from Runnable classes (typically obtained via bentoml.{framework}.get(...).to_runner()) and passed to a bentoml.legacy.Service.

Code Reference

Source Location

Signature

@attr.frozen(slots=False)
class RunnerMethod(t.Generic[T, P, R]):
    runner: Runner | TritonRunner
    name: str
    config: RunnableMethodConfig
    max_batch_size: int
    max_latency_ms: int
    doc: str | None = None

    def run(self, *args: P.args, **kwargs: P.kwargs) -> R: ...
    async def async_run(self, *args: P.args, **kwargs: P.kwargs) -> R: ...
    def async_stream(self, *args: P.args, **kwargs: P.kwargs) -> t.AsyncGenerator[str, None]: ...


@attr.define(slots=False, frozen=True)
class AbstractRunner(ABC):
    name: str
    models: list[Model]
    resource_config: dict[str, t.Any]
    runnable_class: type[Runnable]
    embedded: bool


@attr.define(slots=False, frozen=True, eq=False, init=False)
class Runner(AbstractRunner):
    def __init__(
        self,
        runnable_class: type[Runnable],
        *,
        runnable_init_params: dict[str, t.Any] | None = None,
        name: str | None = None,
        scheduling_strategy: type[Strategy] = DefaultStrategy,
        models: list[Model] | None = None,
        max_batch_size: int | None = None,
        max_latency_ms: int | None = None,
        method_configs: dict[str, dict[str, int]] | None = None,
        embedded: bool = False,
    ) -> None: ...

    def init_local(self, quiet: bool = False) -> None: ...
    def init_client(self, handle_class: type[RunnerHandle] | None = None, *args, **kwargs): ...
    def destroy(self): ...
    async def runner_handle_is_ready(self, timeout: int = ...) -> bool: ...

Import

from bentoml._internal.runner.runner import Runner
from bentoml._internal.runner.runner import RunnerMethod

I/O Contract

Inputs

Name Type Required Description
runnable_class type[Runnable] Yes The Runnable subclass that defines the computation logic and methods.
name str or None No Runner name. If None, derived from the runnable class name (lowercased).
scheduling_strategy type[Strategy] No Strategy class for worker scheduling. Defaults to DefaultStrategy.
models list[Model] or None No List of BentoML Model instances required by this runner.
max_batch_size int or None No Global max batch size for dynamic batching. Overridden by method-level config.
max_latency_ms int or None No Global max latency for dynamic batching. Overridden by method-level config.
method_configs dict[str, dict[str, int]] or None No Per-method configuration for max_batch_size and max_latency_ms.
embedded bool No Whether the runner runs in-process. Defaults to False.

Outputs

Name Type Description
Runner Runner A configured Runner instance with methods accessible as attributes.

Usage Examples

import bentoml

# Create a runner from a saved model
runner = bentoml.sklearn.get("iris_clf:latest").to_runner()

# Use in a legacy service
svc = bentoml.legacy.Service("iris-classifier", runners=[runner])

@svc.api(
    input=bentoml.io.NumpyNdarray(),
    output=bentoml.io.NumpyNdarray(),
)
def predict(input_arr):
    return runner.run(input_arr)


# For testing/debugging, initialize locally
runner.init_local(quiet=True)
result = runner.run([[5, 4, 3, 2]])
runner.destroy()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment