Implementation:Bentoml BentoML Bentoml Api Decorator

**Metadata**
Knowledge Sources	BentoML BentoML API Reference
Domains	ML_Serving API_Design
Last Updated	2026-02-13 15:00 GMT

Overview

Concrete decorator for converting a Python method on a BentoML service class into an HTTP API endpoint. The @bentoml.api decorator wraps the method in an APIMethod[P, R] object that handles routing, serialization, validation, and optional adaptive batching.

Description

The @bentoml.api decorator supports two invocation styles:

Bare decorator -- @bentoml.api applied directly to a method with no arguments.
Parameterized decorator -- @bentoml.api(route="/v1/predict", batchable=True, ...) returning a decorator.

When applied, it creates an APIMethod[P, R] wrapper that captures the method's type signature, route configuration, batching parameters, and optional Pydantic input/output specs. The Service[T] wrapper discovers these APIMethod objects during initialization and registers them as HTTP endpoints.

For adaptive batching, when batchable=True, the serving runtime collects individual incoming requests and merges them along the specified batch_dim axis. The batched input is passed to the method as a single call, and the output is split back into individual responses. The runtime dynamically tunes batch sizes based on max_batch_size and max_latency_ms.

Usage

Import and apply the decorator to methods on a @bentoml.service class:

import bentoml
import numpy as np

@bentoml.service
class EmbeddingService:
    @bentoml.api(batchable=True, max_batch_size=64, max_latency_ms=500)
    def encode(self, texts: list[str]) -> np.ndarray:
        return self.model.encode(texts)

Code Reference

Source Location

Repository: bentoml/BentoML
File: src/_bentoml_sdk/decorators.py (lines 60--109)

Signature

def api(
    func: Callable[Concatenate[Any, P], R] | None = None,
    *,
    route: str | None = None,
    name: str | None = None,
    input_spec: type[IODescriptor] | None = None,
    output_spec: type[IODescriptor] | None = None,
    batchable: bool = False,
    batch_dim: int | tuple[int, int] = 0,
    max_batch_size: int = 100,
    max_latency_ms: int = 60000,
) -> APIMethod[P, R] | Callable[[...], APIMethod[P, R]]

Import

import bentoml

# Used as:
@bentoml.api
def predict(self, x: str) -> str: ...

# Or with parameters:
@bentoml.api(route="/v1/predict", batchable=True)
def predict(self, x: str) -> str: ...

I/O Contract

Inputs

**Input Contract**
Name	Type	Description
`func`	None	The method to decorate. Provided implicitly when the decorator is used without parentheses.
`route`	None	Custom HTTP route path (e.g., `"/v1/predict"`). Defaults to `/{method_name}`.
`name`	None	Custom name for the API endpoint. Defaults to the method name.
`input_spec`	None	Pydantic model defining the input schema for request validation and OpenAPI documentation.
`output_spec`	None	Pydantic model defining the output schema for response serialization and OpenAPI documentation.
`batchable`	bool	Whether to enable adaptive batching. Defaults to `False`.
`batch_dim`	tuple[int, int]	Axis along which to concatenate inputs (and split outputs) for batching. Defaults to `0`. A tuple specifies different dimensions for input and output.
`max_batch_size`	int	Maximum number of requests to batch together. Defaults to `100`.
`max_latency_ms`	int	Maximum time in milliseconds to wait for a full batch before dispatching a partial batch. Defaults to `60000`.

Outputs

**Output Contract**
Name	Type	Description
Return value	APIMethod[P, R]	An `APIMethod` wrapper around the original method. This object is discovered by `Service[T]` during initialization and registered as an HTTP endpoint.

Usage Examples

Example 1: Simple API Endpoint

A basic text-in, text-out endpoint.

import bentoml

@bentoml.service
class Summarizer:
    @bentoml.api
    def summarize(self, text: str) -> str:
        return self.model.summarize(text)

Route defaults to /summarize.
Input and output are serialized as JSON strings.

Example 2: Batched Endpoint with Custom Route

An embedding endpoint with adaptive batching enabled.

import bentoml
import numpy as np

@bentoml.service(resources={"gpu": 1})
class EmbeddingService:
    @bentoml.api(
        route="/v1/embeddings",
        batchable=True,
        max_batch_size=64,
        max_latency_ms=500,
    )
    def encode(self, texts: list[str]) -> np.ndarray:
        return self.model.encode(texts)

batchable=True enables the adaptive batching runtime.
Individual requests are collected and merged up to max_batch_size=64 or max_latency_ms=500 ms, whichever comes first.

Example 3: Pydantic Input/Output Specs

An endpoint with explicit schema definitions for OpenAPI documentation.

import bentoml
from pydantic import BaseModel

class ClassifyInput(BaseModel):
    text: str
    language: str = "en"

class ClassifyOutput(BaseModel):
    label: str
    score: float

@bentoml.service
class Classifier:
    @bentoml.api(input_spec=ClassifyInput, output_spec=ClassifyOutput)
    def classify(self, input_data: ClassifyInput) -> ClassifyOutput:
        result = self.model.predict(input_data.text)
        return ClassifyOutput(label=result["label"], score=result["score"])

input_spec and output_spec generate OpenAPI schema documentation.
Incoming requests are validated against ClassifyInput before reaching the method.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment