Implementation:Bentoml BentoML Bentoml Api Decorator
| Knowledge Sources | |
|---|---|
| Domains | |
| Last Updated | 2026-02-13 15:00 GMT |
Overview
Concrete decorator for converting a Python method on a BentoML service class into an HTTP API endpoint. The @bentoml.api decorator wraps the method in an APIMethod[P, R] object that handles routing, serialization, validation, and optional adaptive batching.
Description
The @bentoml.api decorator supports two invocation styles:
- Bare decorator --
@bentoml.apiapplied directly to a method with no arguments. - Parameterized decorator --
@bentoml.api(route="/v1/predict", batchable=True, ...)returning a decorator.
When applied, it creates an APIMethod[P, R] wrapper that captures the method's type signature, route configuration, batching parameters, and optional Pydantic input/output specs. The Service[T] wrapper discovers these APIMethod objects during initialization and registers them as HTTP endpoints.
For adaptive batching, when batchable=True, the serving runtime collects individual incoming requests and merges them along the specified batch_dim axis. The batched input is passed to the method as a single call, and the output is split back into individual responses. The runtime dynamically tunes batch sizes based on max_batch_size and max_latency_ms.
Usage
Import and apply the decorator to methods on a @bentoml.service class:
import bentoml
import numpy as np
@bentoml.service
class EmbeddingService:
@bentoml.api(batchable=True, max_batch_size=64, max_latency_ms=500)
def encode(self, texts: list[str]) -> np.ndarray:
return self.model.encode(texts)
Code Reference
Source Location
- Repository:
bentoml/BentoML - File:
src/_bentoml_sdk/decorators.py(lines 60--109)
Signature
def api(
func: Callable[Concatenate[Any, P], R] | None = None,
*,
route: str | None = None,
name: str | None = None,
input_spec: type[IODescriptor] | None = None,
output_spec: type[IODescriptor] | None = None,
batchable: bool = False,
batch_dim: int | tuple[int, int] = 0,
max_batch_size: int = 100,
max_latency_ms: int = 60000,
) -> APIMethod[P, R] | Callable[[...], APIMethod[P, R]]
Import
import bentoml
# Used as:
@bentoml.api
def predict(self, x: str) -> str: ...
# Or with parameters:
@bentoml.api(route="/v1/predict", batchable=True)
def predict(self, x: str) -> str: ...
I/O Contract
Inputs
| Name | Type | Description |
|---|---|---|
func |
None | The method to decorate. Provided implicitly when the decorator is used without parentheses. |
route |
None | Custom HTTP route path (e.g., "/v1/predict"). Defaults to /{method_name}.
|
name |
None | Custom name for the API endpoint. Defaults to the method name. |
input_spec |
None | Pydantic model defining the input schema for request validation and OpenAPI documentation. |
output_spec |
None | Pydantic model defining the output schema for response serialization and OpenAPI documentation. |
batchable |
bool | Whether to enable adaptive batching. Defaults to False.
|
batch_dim |
tuple[int, int] | Axis along which to concatenate inputs (and split outputs) for batching. Defaults to 0. A tuple specifies different dimensions for input and output.
|
max_batch_size |
int | Maximum number of requests to batch together. Defaults to 100.
|
max_latency_ms |
int | Maximum time in milliseconds to wait for a full batch before dispatching a partial batch. Defaults to 60000.
|
Outputs
| Name | Type | Description |
|---|---|---|
| Return value | APIMethod[P, R] | An APIMethod wrapper around the original method. This object is discovered by Service[T] during initialization and registered as an HTTP endpoint.
|
Usage Examples
Example 1: Simple API Endpoint
A basic text-in, text-out endpoint.
import bentoml
@bentoml.service
class Summarizer:
@bentoml.api
def summarize(self, text: str) -> str:
return self.model.summarize(text)
- Route defaults to
/summarize. - Input and output are serialized as JSON strings.
Example 2: Batched Endpoint with Custom Route
An embedding endpoint with adaptive batching enabled.
import bentoml
import numpy as np
@bentoml.service(resources={"gpu": 1})
class EmbeddingService:
@bentoml.api(
route="/v1/embeddings",
batchable=True,
max_batch_size=64,
max_latency_ms=500,
)
def encode(self, texts: list[str]) -> np.ndarray:
return self.model.encode(texts)
batchable=Trueenables the adaptive batching runtime.- Individual requests are collected and merged up to
max_batch_size=64ormax_latency_ms=500ms, whichever comes first.
Example 3: Pydantic Input/Output Specs
An endpoint with explicit schema definitions for OpenAPI documentation.
import bentoml
from pydantic import BaseModel
class ClassifyInput(BaseModel):
text: str
language: str = "en"
class ClassifyOutput(BaseModel):
label: str
score: float
@bentoml.service
class Classifier:
@bentoml.api(input_spec=ClassifyInput, output_spec=ClassifyOutput)
def classify(self, input_data: ClassifyInput) -> ClassifyOutput:
result = self.model.predict(input_data.text)
return ClassifyOutput(label=result["label"], score=result["score"])
input_specandoutput_specgenerate OpenAPI schema documentation.- Incoming requests are validated against
ClassifyInputbefore reaching the method.