Implementation:Bentoml BentoML ServiceConfig
| Knowledge Sources | |
|---|---|
| Domains | Configuration, Service |
| Last Updated | 2026-02-13 15:00 GMT |
Overview
Defines the complete TypedDict-based configuration schema for BentoML services, covering traffic, resources, HTTP/gRPC, SSL, tracing, monitoring, metrics, and logging settings.
Description
The config module provides a comprehensive, strongly-typed configuration schema for BentoML services using Python's TypedDict with Pydantic and annotated-types validation. The top-level ServiceConfig TypedDict aggregates the following sub-schemas:
- TrafficSchema -- Timeout, max concurrency, concurrency capacity, and external queue toggle for BentoCloud.
- ResourceSchema -- CPU (cores or millicores), memory (Gi or string with unit), GPU count, GPU type (NVIDIA and AMD literals), and TPU type literals.
- WorkerSchema -- Number of workers as a positive integer or "cpu_count" literal.
- MetricSchema / MetricDuration -- Prometheus metrics configuration with custom histogram buckets.
- AccessLoggingSchema -- Access log filtering, content type/length logging, and trace/span ID format.
- SSLSchema -- TLS certificate, key, CA, and cipher configuration.
- HTTPSchema / HTTPCorsSchema -- HTTP host, port, proxy port, CORS settings (origins, methods, headers, credentials).
- GRPCSchema -- gRPC host, port, concurrent streams, reflection, channelz, and max message length.
- TracingSchema -- OpenTelemetry tracing with Zipkin, Jaeger, and OTLP exporter configurations.
- MonitoringSchema -- Monitoring type and options.
- LoggingSchema -- Wraps access logging configuration.
- EndpointsSchema -- Custom paths for liveness and readiness probes.
The module also exposes a validate function that uses Pydantic's TypeAdapter for runtime validation of configuration dictionaries.
Usage
Use this module to define, validate, and type-check BentoML service configuration, typically passed to the @bentoml.service() decorator or loaded from configuration files.
Code Reference
Source Location
- Repository: Bentoml_BentoML
- File: src/_bentoml_sdk/service/config.py
- Lines: 1-285
Signature
class ServiceConfig(TypedDict, total=False):
traffic: TrafficSchema
extra_ports: list[int]
backlog: Annotated[int, Ge(64)]
max_runner_connections: Posint
runner_connection: RunnerConnectionSchema
resources: ResourceSchema
workers: WorkerSchema
replicate_process: bool
threads: Posint
metrics: MetricSchema
logging: LoggingSchema
ssl: SSLSchema
http: HTTPSchema
grpc: GRPCSchema
runner_probe: RunnerProbeSchema
tracing: TracingSchema
monitoring: MonitoringSchema
endpoints: EndpointsSchema
schema_type = TypeAdapter(ServiceConfig)
def validate(data: ServiceConfig) -> ServiceConfig: ...
Import
from _bentoml_sdk.service.config import ServiceConfig, validate
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| data | ServiceConfig | Yes | A dictionary conforming to the ServiceConfig TypedDict schema |
Outputs
| Name | Type | Description |
|---|---|---|
| ServiceConfig | ServiceConfig | The validated configuration dictionary (from validate function) |
Usage Examples
from _bentoml_sdk.service.config import ServiceConfig, validate
# Define a service configuration
config: ServiceConfig = {
"traffic": {
"timeout": 60.0,
"max_concurrency": 100,
},
"resources": {
"cpu": 4,
"memory": "8Gi",
"gpu": 1,
"gpu_type": "nvidia-tesla-a100",
},
"workers": 2,
"http": {
"port": 3000,
"cors": {
"enabled": True,
"access_control_allow_origins": ["http://localhost:3000"],
},
},
}
# Validate configuration at runtime
validated = validate(config)