Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Bentoml BentoML API Endpoint Definition

From Leeroopedia
Metadata
Knowledge Sources
Domains
Last Updated 2026-02-13 15:00 GMT

Overview

A design pattern for declaring HTTP API endpoints on a BentoML service using method decoration. The @bentoml.api decorator converts a Python method into an HTTP endpoint with automatic input/output serialization, type validation, and optional adaptive batching.

Description

API endpoint definition in BentoML transforms regular Python methods on a service class into network-accessible endpoints. The @bentoml.api decorator wraps each method in an APIMethod object that handles:

  • Route registration -- each decorated method becomes an HTTP POST endpoint. The route defaults to the method name (e.g., /predict) but can be overridden via the route parameter.
  • Serialization -- function arguments and return values are automatically serialized/deserialized using type hints. BentoML supports Python primitives, Pydantic models, NumPy arrays, Pandas DataFrames, and file uploads.
  • Input/output validation -- when input_spec or output_spec Pydantic models are provided, BentoML enforces schema validation and generates OpenAPI documentation.
  • Adaptive batching -- when batchable=True, BentoML collects individual requests and groups them into batches before calling the handler, improving throughput for GPU-bound models.

Both synchronous and asynchronous (async def) methods are supported. Async methods are preferred for I/O-bound workloads as they allow the event loop to handle concurrent requests efficiently.

Usage

Use the @bentoml.api decorator when you need to:

  • Expose a model inference function as an HTTP endpoint.
  • Enable adaptive batching to maximize GPU utilization.
  • Define explicit input/output schemas using Pydantic for auto-generated API documentation.
  • Customize the HTTP route path for an endpoint.

A typical usage:

import bentoml
import numpy as np
from pydantic import BaseModel

class InputData(BaseModel):
    features: list[float]

class OutputData(BaseModel):
    prediction: float
    confidence: float

@bentoml.service
class PredictionService:
    @bentoml.api(
        input_spec=InputData,
        output_spec=OutputData,
        route="/v1/predict",
    )
    def predict(self, input_data: InputData) -> OutputData:
        result = self.model.predict(np.array(input_data.features))
        return OutputData(prediction=result[0], confidence=0.95)

Theoretical Basis

The API endpoint definition pattern applies the method interception paradigm: a decorator intercepts method calls to inject cross-cutting concerns (serialization, validation, batching) without modifying the business logic.

The abstract pattern is as follows:

API_ENDPOINT(method, config):
    ROUTE REGISTRATION:
        HTTP POST /{route or method.__name__}
            -> Maps to method on the service instance

    REQUEST LIFECYCLE:
        1. RECEIVE raw HTTP request
        2. DESERIALIZE body -> typed Python arguments
           (using type hints, input_spec, or IODescriptor)
        3. VALIDATE inputs against schema
        4. IF batchable:
              COLLECT requests until max_batch_size or max_latency_ms
              CALL method(batched_inputs)
              SPLIT outputs back to individual responses
           ELSE:
              CALL method(inputs)
        5. SERIALIZE return value -> HTTP response body
        6. SEND HTTP response

    BATCHING (optional):
        batch_dim   : axis along which to concatenate inputs
        max_batch_size : upper limit on batch size
        max_latency_ms : maximum wait time before dispatching partial batch

Key theoretical properties:

  • Transparency -- the method signature defines the API contract; the decorator adds infrastructure without changing the method body.
  • Adaptive batching -- dynamically adjusts batch sizes based on incoming request rates and latency budgets, optimizing throughput without manual tuning.
  • Schema-driven documentation -- Pydantic-based specs generate OpenAPI/Swagger documentation automatically.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment