Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Bentoml BentoML Bentoml Api Decorator

From Leeroopedia
Metadata
Knowledge Sources
Domains
Last Updated 2026-02-13 15:00 GMT

Overview

Concrete decorator for converting a Python method on a BentoML service class into an HTTP API endpoint. The @bentoml.api decorator wraps the method in an APIMethod[P, R] object that handles routing, serialization, validation, and optional adaptive batching.

Description

The @bentoml.api decorator supports two invocation styles:

  • Bare decorator -- @bentoml.api applied directly to a method with no arguments.
  • Parameterized decorator -- @bentoml.api(route="/v1/predict", batchable=True, ...) returning a decorator.

When applied, it creates an APIMethod[P, R] wrapper that captures the method's type signature, route configuration, batching parameters, and optional Pydantic input/output specs. The Service[T] wrapper discovers these APIMethod objects during initialization and registers them as HTTP endpoints.

For adaptive batching, when batchable=True, the serving runtime collects individual incoming requests and merges them along the specified batch_dim axis. The batched input is passed to the method as a single call, and the output is split back into individual responses. The runtime dynamically tunes batch sizes based on max_batch_size and max_latency_ms.

Usage

Import and apply the decorator to methods on a @bentoml.service class:

import bentoml
import numpy as np

@bentoml.service
class EmbeddingService:
    @bentoml.api(batchable=True, max_batch_size=64, max_latency_ms=500)
    def encode(self, texts: list[str]) -> np.ndarray:
        return self.model.encode(texts)

Code Reference

Source Location

  • Repository: bentoml/BentoML
  • File: src/_bentoml_sdk/decorators.py (lines 60--109)

Signature

def api(
    func: Callable[Concatenate[Any, P], R] | None = None,
    *,
    route: str | None = None,
    name: str | None = None,
    input_spec: type[IODescriptor] | None = None,
    output_spec: type[IODescriptor] | None = None,
    batchable: bool = False,
    batch_dim: int | tuple[int, int] = 0,
    max_batch_size: int = 100,
    max_latency_ms: int = 60000,
) -> APIMethod[P, R] | Callable[[...], APIMethod[P, R]]

Import

import bentoml

# Used as:
@bentoml.api
def predict(self, x: str) -> str: ...

# Or with parameters:
@bentoml.api(route="/v1/predict", batchable=True)
def predict(self, x: str) -> str: ...

I/O Contract

Inputs

Input Contract
Name Type Description
func None The method to decorate. Provided implicitly when the decorator is used without parentheses.
route None Custom HTTP route path (e.g., "/v1/predict"). Defaults to /{method_name}.
name None Custom name for the API endpoint. Defaults to the method name.
input_spec None Pydantic model defining the input schema for request validation and OpenAPI documentation.
output_spec None Pydantic model defining the output schema for response serialization and OpenAPI documentation.
batchable bool Whether to enable adaptive batching. Defaults to False.
batch_dim tuple[int, int] Axis along which to concatenate inputs (and split outputs) for batching. Defaults to 0. A tuple specifies different dimensions for input and output.
max_batch_size int Maximum number of requests to batch together. Defaults to 100.
max_latency_ms int Maximum time in milliseconds to wait for a full batch before dispatching a partial batch. Defaults to 60000.

Outputs

Output Contract
Name Type Description
Return value APIMethod[P, R] An APIMethod wrapper around the original method. This object is discovered by Service[T] during initialization and registered as an HTTP endpoint.

Usage Examples

Example 1: Simple API Endpoint

A basic text-in, text-out endpoint.

import bentoml

@bentoml.service
class Summarizer:
    @bentoml.api
    def summarize(self, text: str) -> str:
        return self.model.summarize(text)
  • Route defaults to /summarize.
  • Input and output are serialized as JSON strings.

Example 2: Batched Endpoint with Custom Route

An embedding endpoint with adaptive batching enabled.

import bentoml
import numpy as np

@bentoml.service(resources={"gpu": 1})
class EmbeddingService:
    @bentoml.api(
        route="/v1/embeddings",
        batchable=True,
        max_batch_size=64,
        max_latency_ms=500,
    )
    def encode(self, texts: list[str]) -> np.ndarray:
        return self.model.encode(texts)
  • batchable=True enables the adaptive batching runtime.
  • Individual requests are collected and merged up to max_batch_size=64 or max_latency_ms=500 ms, whichever comes first.

Example 3: Pydantic Input/Output Specs

An endpoint with explicit schema definitions for OpenAPI documentation.

import bentoml
from pydantic import BaseModel

class ClassifyInput(BaseModel):
    text: str
    language: str = "en"

class ClassifyOutput(BaseModel):
    label: str
    score: float

@bentoml.service
class Classifier:
    @bentoml.api(input_spec=ClassifyInput, output_spec=ClassifyOutput)
    def classify(self, input_data: ClassifyInput) -> ClassifyOutput:
        result = self.model.predict(input_data.text)
        return ClassifyOutput(label=result["label"], score=result["score"])
  • input_spec and output_spec generate OpenAPI schema documentation.
  • Incoming requests are validated against ClassifyInput before reaching the method.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment