Principle:Bentoml BentoML Composed Pipeline Testing
Overview
Composed Pipeline Testing addresses the challenge of testing multi-service compositions locally before deploying to production. BentoML's local serving mode runs all services in a dependency graph as separate processes under a single Circus supervisor, mirroring the distributed production topology on a single machine.
Detailed Explanation
Testing multi-service ML pipelines is inherently more complex than testing a single model endpoint. Each service in the composition may have different dependencies, resource requirements, and failure modes. Local testing must replicate the distributed production environment as faithfully as possible while remaining practical for development workflows.
The Testing Challenge
In production, a composed BentoML application runs as multiple independent processes (or containers) communicating over the network. This introduces concerns that do not exist in single-service applications:
- Inter-service communication -- Serialization, deserialization, and network latency between services.
- Process isolation -- Each service runs in its own process with its own memory space.
- Dependency resolution -- The framework must discover all services in the dependency graph and start them in the correct order.
- Service discovery -- Each service must know how to reach its dependencies.
BentoML's Local Multi-Service Architecture
BentoML's bentoml serve command addresses these challenges by:
- Recursive dependency discovery: The
all_services()method on a service class recursively traverses the dependency graph starting from the entry service. This discovers every service that will need to run.
- Multi-process orchestration: Each discovered service is spawned as a separate worker process under a Circus process supervisor. This mirrors the process isolation of production.
- Automatic service wiring: The framework sets up inter-process communication channels (Unix domain sockets or TCP) and creates a
runner_bind_mapso each service can discover and connect to its dependencies.
- Single entry point: Despite running multiple processes, the developer only needs to specify the top-level entry service. Everything else is automatic.
Testing Strategies
| Strategy | Scope | Tool | Description |
|---|---|---|---|
| Unit testing | Single service | pytest + mocks | Test individual service methods with mocked dependencies |
| Integration testing | Full pipeline | bentoml serve |
Run all services locally and test end-to-end |
| Contract testing | Service boundaries | Type annotations | Verify that service interfaces match between producer and consumer |
| Load testing | Full pipeline | External tools (locust, k6) | Stress-test the local multi-service deployment |
Unit Testing with Mocked Dependencies
Because BentoML uses dependency injection, individual services can be tested in isolation by replacing dependencies with mocks:
# Test a service with mocked dependencies
def test_pipeline_service():
pipeline = Pipeline()
pipeline.model = MockModelService()
pipeline.preprocessor = MockPreprocessor()
result = pipeline.predict("test input")
assert result["label"] in [0, 1]
Integration Testing with Local Serve
The bentoml serve command provides full integration testing:
# Start all services in the composition
bentoml serve service:Pipeline
# In another terminal, test the full pipeline
curl -X POST http://localhost:3000/predict -d '{"text": "test input"}'
This starts separate processes for every service in the dependency graph, connected via local IPC, providing a faithful replica of the production topology.
The all_services() Discovery Mechanism
The all_services() method is the key enabler for local multi-service testing. It performs a depth-first traversal of the dependency graph:
- Start with the entry service.
- For each
Dependencyattribute, resolve the dependent service class. - Recursively call
all_services()on each dependency. - Return a deduplicated list of all services in topological order.
This ensures that even deeply nested dependency graphs are fully discovered and all services are started.
Relationship to Implementation
This principle is implemented by the bentoml serve command in multi-service mode, which uses all_services() for dependency discovery and Circus for multi-process orchestration.
Implementation:Bentoml_BentoML_Serve_Multi_Service