Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Bentoml BentoML Service Class Definition

From Leeroopedia
Metadata
Knowledge Sources
Domains
Last Updated 2026-02-13 15:00 GMT

Overview

A design pattern for defining ML model serving endpoints as decorated Python classes. In BentoML, a service is a plain Python class transformed into a production-grade serving component via the @bentoml.service decorator, which adds HTTP/gRPC endpoint registration, resource configuration, and lifecycle management.

Description

Service class definition in BentoML is centered on the decorator pattern: the @bentoml.service decorator wraps a user-authored Python class in a Service[T] instance. This wrapper is responsible for:

  • Endpoint registration -- methods decorated with @bentoml.api become HTTP (or gRPC) endpoints with automatic request/response serialization.
  • Resource configuration -- CPU, memory, GPU, and concurrency limits are specified as keyword arguments to the decorator and enforced at runtime.
  • Lifecycle management -- the class __init__ method runs once per worker process at startup, making it the natural place to load models and initialize expensive resources.
  • Dependency composition -- services can depend on other services, forming a directed acyclic graph that BentoML orchestrates across separate worker pools.

The decorator accepts optional parameters for naming, Docker image configuration, environment variables, labels, and a path prefix for URL routing. All resource and traffic configuration is passed via **kwargs that conform to the ServiceConfig TypedDict (covering resources, traffic, and workers settings).

Usage

Use the @bentoml.service decorator when you need to:

  • Expose one or more ML model inference functions as network-accessible API endpoints.
  • Declare resource requirements (GPUs, memory) that BentoML uses for scheduling and containerization.
  • Compose multiple models or processing stages into a single deployable unit.

A minimal service definition looks like this:

import bentoml

@bentoml.service(
    resources={"gpu": 1, "memory": "4Gi"},
    traffic={"timeout": 120},
)
class MyMLService:
    def __init__(self):
        import torch
        self.model = torch.load("model.pt")

    @bentoml.api
    def predict(self, input_text: str) -> str:
        return self.model(input_text)

Theoretical Basis

The service class definition pattern applies the decorator pattern from object-oriented design: a structural pattern that attaches additional responsibilities to an object dynamically without altering its interface.

The abstract pattern is as follows:

SERVICE_CLASS_DEFINITION(class T):
    DECORATOR @bentoml.service:
        1. Introspect class T for @bentoml.api methods
        2. Generate HTTP/gRPC route table from method signatures
        3. Capture resource/traffic/worker configuration
        4. Wrap T in Service[T] proxy

    LIFECYCLE:
        STARTUP:  T.__init__() executes once per worker process
        REQUEST:  Incoming HTTP -> deserialize -> T.method() -> serialize -> HTTP response
        SHUTDOWN: Worker process termination (graceful drain)

    COMPOSITION:
        Service[A] depends_on Service[B]
            -> A and B run in separate worker pools
            -> A calls B via inter-process RPC

Key theoretical properties:

  • Encapsulation -- the user class contains only business logic; all serving infrastructure is injected by the decorator.
  • Separation of concerns -- resource allocation, serialization, and routing are orthogonal to model inference logic.
  • Composability -- services with declared dependencies form a DAG that BentoML deploys and orchestrates automatically.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment