Implementation:Bentoml BentoML Legacy Service

Knowledge Sources	Bentoml_BentoML
Domains	Service Definition, API Specification, Model Serving
Last Updated	2026-02-13 15:00 GMT

Overview

The Legacy Service module defines the Service class that is the core building block of BentoML's legacy (pre-2.0) service-oriented architecture, providing API registration, lifecycle hooks, ASGI/WSGI mounting, and gRPC support.

Description

The Service class (decorated with @deprecated suggesting upgrade to @bentoml.service()) is an attrs-based class that represents a deployable BentoML service. Key features include:

Core attributes:

name: Validated service name (lowercased, must be a valid BentoML Tag).
runners: List of Runner or _TritonRunner instances, validated for uniqueness.
models: List of Model instances.
apis: Dictionary of InferenceAPI instances registered via the @svc.api decorator.
tag / bento: Set when loaded from a Bento artifact.
context: A ServiceContext instance for service-level state.

API registration: The api() method is a decorator factory that creates InferenceAPI instances from IO descriptors and user-defined callback functions. Supports custom routes, names, and documentation strings. Prevents duplicate API names.

Lifecycle hooks:

on_startup(func): Registers a function to run when the service starts. Receives ServiceContext.
on_shutdown(func): Registers a function to run when the service stops. Receives ServiceContext.
on_deployment(func): Registers a function to run on deployment.

ASGI/WSGI support:

asgi_app property: Returns an ASGI application via HTTPAppFactory.
mount_asgi_app(app, path, name): Mounts an ASGI application at a path.
mount_wsgi_app(app, path, name): Wraps a WSGI app with a2wsgi.WSGIMiddleware and mounts it.
add_asgi_middleware(middleware_cls, **options): Adds ASGI middleware.

gRPC support:

grpc_servicer property / get_grpc_servicer(protocol_version): Returns a gRPC servicer.
mount_grpc_servicer: Mounts additional gRPC servicers.
add_grpc_interceptor: Adds gRPC interceptors with validation.
add_grpc_handlers: Adds generic RPC handlers.

Serialization (pickle): The __reduce__ method supports three serialization strategies:

EXPORT_BENTO: Exports and re-imports the Bento via a temporary file.
LOCAL_BENTO: Loads from the local Bento store by tag.
REMOTE_BENTO: Pulls from BentoCloud if not found locally.
Falls back to import_service for source-based services.

OpenAPI: The openapi_spec property generates an OpenAPI specification via generate_spec.

Usage

Used to define BentoML services in the legacy architecture. Users instantiate Service, register runners, and use the @svc.api decorator to define inference endpoints. This class is deprecated in favor of the new @bentoml.service() decorator.

Code Reference

Source Location

Repository: Bentoml_BentoML
File: src/bentoml/_internal/service/service.py
Lines: 1-487

Signature

@deprecated("bentoml.legacy.Service", suggestion="Please upgrade to @bentoml.service().")
@attr.define(frozen=False, init=False)
class Service:
    name: str
    runners: t.List[Runner | _TritonRunner]
    models: t.List[Model]
    apis: t.Dict[str, InferenceAPI[t.Any]]
    tag: Tag | None
    bento: Bento | None
    context: Context

    def __init__(
        self,
        name: str,
        *,
        runners: list[AbstractRunner] | None = None,
        models: list[Model] | None = None,
    ): ...

    def api(
        self,
        input: IODescriptor[IOType],
        output: IODescriptor[IOType],
        *,
        name: str | None = None,
        doc: str | None = None,
        route: str | None = None,
    ) -> _inference_api_wrapper[IOType]: ...

    def on_startup(self, func: HookF_ctx) -> HookF_ctx: ...
    def on_shutdown(self, func: HookF_ctx) -> HookF_ctx: ...
    def on_deployment(self, func: HookF) -> HookF: ...
    def mount_asgi_app(self, app, path="/", name=None) -> None: ...
    def mount_wsgi_app(self, app, path="/", name=None) -> None: ...
    def add_asgi_middleware(self, middleware_cls, **options) -> None: ...
    def get_grpc_servicer(self, protocol_version=...) -> services.BentoServiceServicer: ...
    def on_load_bento(self, bento: Bento) -> None: ...
    def get_service_import_origin(self) -> tuple[str, str]: ...

Import

import bentoml

svc = bentoml.legacy.Service("my-service", runners=[runner])

I/O Contract

Inputs

Name	Type	Required	Description
name	str	Yes	Service name. Will be lowercased and validated as a BentoML Tag.
runners	list[AbstractRunner] or None	No	List of Runner instances. Names must be unique.
models	list[Model] or None	No	List of BentoML Model instances.

Outputs

Name	Type	Description
Service	Service	A configured Service instance with API endpoints accessible via the apis dictionary.

Usage Examples

from __future__ import annotations
from typing import TYPE_CHECKING, Any

import bentoml
from bentoml.io import NumpyNdarray, JSON

if TYPE_CHECKING:
    from numpy.typing import NDArray

runner = bentoml.sklearn.get("iris_clf:latest").to_runner()
svc = bentoml.legacy.Service("iris-classifier", runners=[runner])

@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
def predict(input_arr: NDArray[Any]) -> NDArray[Any]:
    return runner.run(input_arr)

# Lifecycle hooks
@svc.on_startup
async def startup(ctx):
    print("Service starting up")

@svc.on_shutdown
async def shutdown(ctx):
    print("Service shutting down")

# Mount an ASGI app
from starlette.applications import Starlette
from starlette.responses import PlainTextResponse

app = Starlette()

@app.route("/health")
async def health(request):
    return PlainTextResponse("OK")

svc.mount_asgi_app(app, path="/custom")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment