Implementation:Huggingface Datatrove DummyInferenceServer

Knowledge Sources	Huggingface_Datatrove
Domains	Machine Learning Inference, Testing, Server Management
Last Updated	2026-02-14 17:00 GMT

Overview

DummyServer is a lightweight, in-process inference server implementation that returns fixed dummy responses, designed for debugging and testing inference pipelines without requiring a real model or GPU.

Description

DummyServer extends InferenceServer to provide a fully functional but fake inference server for development and testing purposes. Instead of launching an external process, it starts a Python HTTPServer in a daemon thread within the same process. The server handles two endpoints: a POST endpoint at /v1/chat/completions that returns a fixed text response with estimated token counts, and a GET endpoint at /v1/models that returns a model list for readiness checks.

The DummyHandler class (extending BaseHTTPRequestHandler) implements the request handling. For chat completion requests, it parses the request body to estimate prompt token count (using a rough characters-divided-by-4 approximation), generates a fixed completion with 100 tokens, and returns a response in the OpenAI API format. The handler suppresses default HTTP server logging by overriding log_message to do nothing.

The start_server method creates the HTTPServer and starts it in a daemon thread, making it non-blocking. The monitor_health method simply polls in a loop (every 0.5 seconds) to check that the server still exists. The kill_server and server_cleanup methods handle graceful shutdown: server_cleanup fires off the shutdown in a separate daemon thread to avoid blocking asyncio's event loop cleanup, which could otherwise cause the process to hang.

Usage

Use DummyServer when developing, debugging, or testing inference pipeline logic without needing a real model backend. It allows end-to-end pipeline testing with predictable, deterministic responses and no GPU requirement.

Code Reference

Source Location

Repository: Huggingface_Datatrove
File: src/datatrove/pipeline/inference/servers/dummy_server.py
Lines: 1-133

Signature

class DummyHandler(BaseHTTPRequestHandler):
    def do_POST(self):
        ...

    def do_GET(self):
        ...

    def log_message(self, format, *args):
        ...


class DummyServer(InferenceServer):
    def __init__(self, config: "InferenceConfig", rank: int):
        ...

    async def monitor_health(self):
        ...

    async def start_server(self) -> None:
        ...

    def kill_server(self):
        ...

    async def server_cleanup(self):
        ...

Import

from datatrove.pipeline.inference.servers.dummy_server import DummyServer

I/O Contract

Inputs

Name	Type	Required	Description
config	InferenceConfig	Yes	Configuration object for the inference server
rank	int	Yes	Rank identifier for this server instance

Outputs

Name	Type	Description
HTTP response	JSON	Returns OpenAI-compatible chat completion responses with fixed dummy text and estimated token counts

Usage Examples

Basic Usage

from datatrove.pipeline.inference.run_inference import InferenceConfig

# Configure with dummy server for testing
config = InferenceConfig(
    model="dummy-model",
    # DummyServer is typically selected via server_type in config
)

# The DummyServer is instantiated by the inference runner
# It starts an HTTP server on localhost with a dynamically assigned port
# All chat completion requests return: "This is dummy text content
# for debugging purposes. Page contains sample text to simulate OCR output."

Related Pages

Principle:Huggingface_Datatrove_Inference_Server_Management

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment