Implementation:Bentoml BentoML GRPC Prometheus Interceptor
| Knowledge Sources | |
|---|---|
| Domains | gRPC, Prometheus, Metrics, Observability |
| Last Updated | 2026-02-13 15:00 GMT |
Overview
Implements an async gRPC server interceptor that collects Prometheus metrics for request duration, total request count, and in-progress requests.
Description
The PrometheusServerInterceptor class extends aio.ServerInterceptor to instrument BentoML gRPC services with Prometheus metrics. On first invocation, it lazily initializes three metrics via the injected PrometheusClient:
- request_duration_seconds -- A Histogram tracking API request duration with labels for api_name, service_name, service_version, and http_response_code, using configurable duration buckets.
- request_total -- A Counter tracking total request count with the same labels.
- request_in_progress -- A Gauge tracking currently in-progress requests with labels for api_name, service_name, and service_version, using livesum multiprocess mode.
The interceptor uses a contextvars.ContextVar (START_TIME_VAR) to track request start times across the async boundary. It extracts the api_name from the protobuf Request message and obtains HTTP status by converting the gRPC status code. Streaming RPCs are passed through without instrumentation. The namespace defaults to bentoml_api_server.
Usage
Use this interceptor to expose Prometheus metrics for BentoML gRPC services. It should be added before the access log interceptor in the interceptor chain.
Code Reference
Source Location
- Repository: Bentoml_BentoML
- File: src/bentoml/grpc/interceptors/prometheus.py
- Lines: 1-150
Signature
START_TIME_VAR: contextvars.ContextVar[float] = contextvars.ContextVar("START_TIME_VAR")
class PrometheusServerInterceptor(aio.ServerInterceptor):
def __init__(self, *, namespace: str = "bentoml_api_server"): ...
@inject
def _setup(
self,
metrics_client: PrometheusClient = Provide[BentoMLContainer.metrics_client],
duration_buckets: tuple[float, ...] = Provide[BentoMLContainer.duration_buckets],
): ...
async def intercept_service(
self,
continuation: t.Callable[[HandlerCallDetails], t.Awaitable[RpcMethodHandler]],
handler_call_details: HandlerCallDetails,
) -> RpcMethodHandler: ...
Import
from bentoml.grpc.interceptors.prometheus import PrometheusServerInterceptor
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| namespace | str | No | Prometheus metric namespace; defaults to "bentoml_api_server" |
| metrics_client | PrometheusClient | No (injected) | Prometheus client for creating metrics; injected from BentoMLContainer |
| duration_buckets | tuple[float, ...] | No (injected) | Histogram bucket boundaries; injected from BentoMLContainer |
Outputs
| Name | Type | Description |
|---|---|---|
| RpcMethodHandler | RpcMethodHandler | Wrapped handler that records Prometheus metrics |
| (side effect) | Prometheus metrics | Histogram, Counter, and Gauge metrics exported to Prometheus |
Usage Examples
from bentoml.grpc.interceptors.prometheus import PrometheusServerInterceptor
from grpc import aio
# Create server with Prometheus metrics interceptor
prometheus_interceptor = PrometheusServerInterceptor(namespace="my_service")
server = aio.server(interceptors=[prometheus_interceptor])
# Metrics will be available at the Prometheus scrape endpoint:
# - bentoml_api_server_request_duration_seconds
# - bentoml_api_server_request_total
# - bentoml_api_server_request_in_progress