Implementation:Huggingface Datatrove DummyInferenceServer
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning Inference, Testing, Server Management |
| Last Updated | 2026-02-14 17:00 GMT |
Overview
DummyServer is a lightweight, in-process inference server implementation that returns fixed dummy responses, designed for debugging and testing inference pipelines without requiring a real model or GPU.
Description
DummyServer extends InferenceServer to provide a fully functional but fake inference server for development and testing purposes. Instead of launching an external process, it starts a Python HTTPServer in a daemon thread within the same process. The server handles two endpoints: a POST endpoint at /v1/chat/completions that returns a fixed text response with estimated token counts, and a GET endpoint at /v1/models that returns a model list for readiness checks.
The DummyHandler class (extending BaseHTTPRequestHandler) implements the request handling. For chat completion requests, it parses the request body to estimate prompt token count (using a rough characters-divided-by-4 approximation), generates a fixed completion with 100 tokens, and returns a response in the OpenAI API format. The handler suppresses default HTTP server logging by overriding log_message to do nothing.
The start_server method creates the HTTPServer and starts it in a daemon thread, making it non-blocking. The monitor_health method simply polls in a loop (every 0.5 seconds) to check that the server still exists. The kill_server and server_cleanup methods handle graceful shutdown: server_cleanup fires off the shutdown in a separate daemon thread to avoid blocking asyncio's event loop cleanup, which could otherwise cause the process to hang.
Usage
Use DummyServer when developing, debugging, or testing inference pipeline logic without needing a real model backend. It allows end-to-end pipeline testing with predictable, deterministic responses and no GPU requirement.
Code Reference
Source Location
- Repository: Huggingface_Datatrove
- File: src/datatrove/pipeline/inference/servers/dummy_server.py
- Lines: 1-133
Signature
class DummyHandler(BaseHTTPRequestHandler):
def do_POST(self):
...
def do_GET(self):
...
def log_message(self, format, *args):
...
class DummyServer(InferenceServer):
def __init__(self, config: "InferenceConfig", rank: int):
...
async def monitor_health(self):
...
async def start_server(self) -> None:
...
def kill_server(self):
...
async def server_cleanup(self):
...
Import
from datatrove.pipeline.inference.servers.dummy_server import DummyServer
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | InferenceConfig | Yes | Configuration object for the inference server |
| rank | int | Yes | Rank identifier for this server instance |
Outputs
| Name | Type | Description |
|---|---|---|
| HTTP response | JSON | Returns OpenAI-compatible chat completion responses with fixed dummy text and estimated token counts |
Usage Examples
Basic Usage
from datatrove.pipeline.inference.run_inference import InferenceConfig
# Configure with dummy server for testing
config = InferenceConfig(
model="dummy-model",
# DummyServer is typically selected via server_type in config
)
# The DummyServer is instantiated by the inference runner
# It starts an HTTP server on localhost with a dynamically assigned port
# All chat completion requests return: "This is dummy text content
# for debugging purposes. Page contains sample text to simulate OCR output."