Implementation:Bentoml BentoML GRPC Prometheus Interceptor

Knowledge Sources	Bentoml_BentoML
Domains	gRPC, Prometheus, Metrics, Observability
Last Updated	2026-02-13 15:00 GMT

Overview

Implements an async gRPC server interceptor that collects Prometheus metrics for request duration, total request count, and in-progress requests.

Description

The PrometheusServerInterceptor class extends aio.ServerInterceptor to instrument BentoML gRPC services with Prometheus metrics. On first invocation, it lazily initializes three metrics via the injected PrometheusClient:

request_duration_seconds -- A Histogram tracking API request duration with labels for api_name, service_name, service_version, and http_response_code, using configurable duration buckets.
request_total -- A Counter tracking total request count with the same labels.
request_in_progress -- A Gauge tracking currently in-progress requests with labels for api_name, service_name, and service_version, using livesum multiprocess mode.

The interceptor uses a contextvars.ContextVar (START_TIME_VAR) to track request start times across the async boundary. It extracts the api_name from the protobuf Request message and obtains HTTP status by converting the gRPC status code. Streaming RPCs are passed through without instrumentation. The namespace defaults to bentoml_api_server.

Usage

Use this interceptor to expose Prometheus metrics for BentoML gRPC services. It should be added before the access log interceptor in the interceptor chain.

Code Reference

Source Location

Repository: Bentoml_BentoML
File: src/bentoml/grpc/interceptors/prometheus.py
Lines: 1-150

Signature

START_TIME_VAR: contextvars.ContextVar[float] = contextvars.ContextVar("START_TIME_VAR")

class PrometheusServerInterceptor(aio.ServerInterceptor):
    def __init__(self, *, namespace: str = "bentoml_api_server"): ...

    @inject
    def _setup(
        self,
        metrics_client: PrometheusClient = Provide[BentoMLContainer.metrics_client],
        duration_buckets: tuple[float, ...] = Provide[BentoMLContainer.duration_buckets],
    ): ...

    async def intercept_service(
        self,
        continuation: t.Callable[[HandlerCallDetails], t.Awaitable[RpcMethodHandler]],
        handler_call_details: HandlerCallDetails,
    ) -> RpcMethodHandler: ...

Import

from bentoml.grpc.interceptors.prometheus import PrometheusServerInterceptor

I/O Contract

Inputs

Name	Type	Required	Description
namespace	str	No	Prometheus metric namespace; defaults to "bentoml_api_server"
metrics_client	PrometheusClient	No (injected)	Prometheus client for creating metrics; injected from BentoMLContainer
duration_buckets	tuple[float, ...]	No (injected)	Histogram bucket boundaries; injected from BentoMLContainer

Outputs

Name	Type	Description
RpcMethodHandler	RpcMethodHandler	Wrapped handler that records Prometheus metrics
(side effect)	Prometheus metrics	Histogram, Counter, and Gauge metrics exported to Prometheus

Usage Examples

from bentoml.grpc.interceptors.prometheus import PrometheusServerInterceptor
from grpc import aio

# Create server with Prometheus metrics interceptor
prometheus_interceptor = PrometheusServerInterceptor(namespace="my_service")
server = aio.server(interceptors=[prometheus_interceptor])

# Metrics will be available at the Prometheus scrape endpoint:
# - bentoml_api_server_request_duration_seconds
# - bentoml_api_server_request_total
# - bentoml_api_server_request_in_progress

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment