Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Bentoml BentoML GRPC Prometheus Interceptor

From Leeroopedia
Knowledge Sources
Domains gRPC, Prometheus, Metrics, Observability
Last Updated 2026-02-13 15:00 GMT

Overview

Implements an async gRPC server interceptor that collects Prometheus metrics for request duration, total request count, and in-progress requests.

Description

The PrometheusServerInterceptor class extends aio.ServerInterceptor to instrument BentoML gRPC services with Prometheus metrics. On first invocation, it lazily initializes three metrics via the injected PrometheusClient:

  • request_duration_seconds -- A Histogram tracking API request duration with labels for api_name, service_name, service_version, and http_response_code, using configurable duration buckets.
  • request_total -- A Counter tracking total request count with the same labels.
  • request_in_progress -- A Gauge tracking currently in-progress requests with labels for api_name, service_name, and service_version, using livesum multiprocess mode.

The interceptor uses a contextvars.ContextVar (START_TIME_VAR) to track request start times across the async boundary. It extracts the api_name from the protobuf Request message and obtains HTTP status by converting the gRPC status code. Streaming RPCs are passed through without instrumentation. The namespace defaults to bentoml_api_server.

Usage

Use this interceptor to expose Prometheus metrics for BentoML gRPC services. It should be added before the access log interceptor in the interceptor chain.

Code Reference

Source Location

Signature

START_TIME_VAR: contextvars.ContextVar[float] = contextvars.ContextVar("START_TIME_VAR")

class PrometheusServerInterceptor(aio.ServerInterceptor):
    def __init__(self, *, namespace: str = "bentoml_api_server"): ...

    @inject
    def _setup(
        self,
        metrics_client: PrometheusClient = Provide[BentoMLContainer.metrics_client],
        duration_buckets: tuple[float, ...] = Provide[BentoMLContainer.duration_buckets],
    ): ...

    async def intercept_service(
        self,
        continuation: t.Callable[[HandlerCallDetails], t.Awaitable[RpcMethodHandler]],
        handler_call_details: HandlerCallDetails,
    ) -> RpcMethodHandler: ...

Import

from bentoml.grpc.interceptors.prometheus import PrometheusServerInterceptor

I/O Contract

Inputs

Name Type Required Description
namespace str No Prometheus metric namespace; defaults to "bentoml_api_server"
metrics_client PrometheusClient No (injected) Prometheus client for creating metrics; injected from BentoMLContainer
duration_buckets tuple[float, ...] No (injected) Histogram bucket boundaries; injected from BentoMLContainer

Outputs

Name Type Description
RpcMethodHandler RpcMethodHandler Wrapped handler that records Prometheus metrics
(side effect) Prometheus metrics Histogram, Counter, and Gauge metrics exported to Prometheus

Usage Examples

from bentoml.grpc.interceptors.prometheus import PrometheusServerInterceptor
from grpc import aio

# Create server with Prometheus metrics interceptor
prometheus_interceptor = PrometheusServerInterceptor(namespace="my_service")
server = aio.server(interceptors=[prometheus_interceptor])

# Metrics will be available at the Prometheus scrape endpoint:
# - bentoml_api_server_request_duration_seconds
# - bentoml_api_server_request_total
# - bentoml_api_server_request_in_progress

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment