Principle:Bentoml BentoML Composed Pipeline Testing

Overview

Composed Pipeline Testing addresses the challenge of testing multi-service compositions locally before deploying to production. BentoML's local serving mode runs all services in a dependency graph as separate processes under a single Circus supervisor, mirroring the distributed production topology on a single machine.

Detailed Explanation

Testing multi-service ML pipelines is inherently more complex than testing a single model endpoint. Each service in the composition may have different dependencies, resource requirements, and failure modes. Local testing must replicate the distributed production environment as faithfully as possible while remaining practical for development workflows.

The Testing Challenge

In production, a composed BentoML application runs as multiple independent processes (or containers) communicating over the network. This introduces concerns that do not exist in single-service applications:

Inter-service communication -- Serialization, deserialization, and network latency between services.
Process isolation -- Each service runs in its own process with its own memory space.
Dependency resolution -- The framework must discover all services in the dependency graph and start them in the correct order.
Service discovery -- Each service must know how to reach its dependencies.

BentoML's Local Multi-Service Architecture

BentoML's bentoml serve command addresses these challenges by:

Recursive dependency discovery: The all_services() method on a service class recursively traverses the dependency graph starting from the entry service. This discovers every service that will need to run.

Multi-process orchestration: Each discovered service is spawned as a separate worker process under a Circus process supervisor. This mirrors the process isolation of production.

Automatic service wiring: The framework sets up inter-process communication channels (Unix domain sockets or TCP) and creates a runner_bind_map so each service can discover and connect to its dependencies.

Single entry point: Despite running multiple processes, the developer only needs to specify the top-level entry service. Everything else is automatic.

Testing Strategies

Strategy	Scope	Tool	Description
Unit testing	Single service	pytest + mocks	Test individual service methods with mocked dependencies
Integration testing	Full pipeline	`bentoml serve`	Run all services locally and test end-to-end
Contract testing	Service boundaries	Type annotations	Verify that service interfaces match between producer and consumer
Load testing	Full pipeline	External tools (locust, k6)	Stress-test the local multi-service deployment

Unit Testing with Mocked Dependencies

Because BentoML uses dependency injection, individual services can be tested in isolation by replacing dependencies with mocks:

# Test a service with mocked dependencies
def test_pipeline_service():
    pipeline = Pipeline()
    pipeline.model = MockModelService()
    pipeline.preprocessor = MockPreprocessor()
    result = pipeline.predict("test input")
    assert result["label"] in [0, 1]

Integration Testing with Local Serve

The bentoml serve command provides full integration testing:

# Start all services in the composition
bentoml serve service:Pipeline

# In another terminal, test the full pipeline
curl -X POST http://localhost:3000/predict -d '{"text": "test input"}'

This starts separate processes for every service in the dependency graph, connected via local IPC, providing a faithful replica of the production topology.

The `all_services()` Discovery Mechanism

The all_services() method is the key enabler for local multi-service testing. It performs a depth-first traversal of the dependency graph:

Start with the entry service.
For each Dependency attribute, resolve the dependent service class.
Recursively call all_services() on each dependency.
Return a deduplicated list of all services in topological order.

This ensures that even deeply nested dependency graphs are fully discovered and all services are started.

Relationship to Implementation

This principle is implemented by the bentoml serve command in multi-service mode, which uses all_services() for dependency discovery and Circus for multi-process orchestration.

Implementation:Bentoml_BentoML_Serve_Multi_Service

Metadata

Knowledge Sources

2026-02-13 15:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment