Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Datatrove DummyInferenceServer

From Leeroopedia
Revision as of 13:01, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Huggingface_Datatrove_DummyInferenceServer.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Machine Learning Inference, Testing, Server Management
Last Updated 2026-02-14 17:00 GMT

Overview

DummyServer is a lightweight, in-process inference server implementation that returns fixed dummy responses, designed for debugging and testing inference pipelines without requiring a real model or GPU.

Description

DummyServer extends InferenceServer to provide a fully functional but fake inference server for development and testing purposes. Instead of launching an external process, it starts a Python HTTPServer in a daemon thread within the same process. The server handles two endpoints: a POST endpoint at /v1/chat/completions that returns a fixed text response with estimated token counts, and a GET endpoint at /v1/models that returns a model list for readiness checks.

The DummyHandler class (extending BaseHTTPRequestHandler) implements the request handling. For chat completion requests, it parses the request body to estimate prompt token count (using a rough characters-divided-by-4 approximation), generates a fixed completion with 100 tokens, and returns a response in the OpenAI API format. The handler suppresses default HTTP server logging by overriding log_message to do nothing.

The start_server method creates the HTTPServer and starts it in a daemon thread, making it non-blocking. The monitor_health method simply polls in a loop (every 0.5 seconds) to check that the server still exists. The kill_server and server_cleanup methods handle graceful shutdown: server_cleanup fires off the shutdown in a separate daemon thread to avoid blocking asyncio's event loop cleanup, which could otherwise cause the process to hang.

Usage

Use DummyServer when developing, debugging, or testing inference pipeline logic without needing a real model backend. It allows end-to-end pipeline testing with predictable, deterministic responses and no GPU requirement.

Code Reference

Source Location

  • Repository: Huggingface_Datatrove
  • File: src/datatrove/pipeline/inference/servers/dummy_server.py
  • Lines: 1-133

Signature

class DummyHandler(BaseHTTPRequestHandler):
    def do_POST(self):
        ...

    def do_GET(self):
        ...

    def log_message(self, format, *args):
        ...


class DummyServer(InferenceServer):
    def __init__(self, config: "InferenceConfig", rank: int):
        ...

    async def monitor_health(self):
        ...

    async def start_server(self) -> None:
        ...

    def kill_server(self):
        ...

    async def server_cleanup(self):
        ...

Import

from datatrove.pipeline.inference.servers.dummy_server import DummyServer

I/O Contract

Inputs

Name Type Required Description
config InferenceConfig Yes Configuration object for the inference server
rank int Yes Rank identifier for this server instance

Outputs

Name Type Description
HTTP response JSON Returns OpenAI-compatible chat completion responses with fixed dummy text and estimated token counts

Usage Examples

Basic Usage

from datatrove.pipeline.inference.run_inference import InferenceConfig

# Configure with dummy server for testing
config = InferenceConfig(
    model="dummy-model",
    # DummyServer is typically selected via server_type in config
)

# The DummyServer is instantiated by the inference runner
# It starts an HTTP server on localhost with a dynamically assigned port
# All chat completion requests return: "This is dummy text content
# for debugging purposes. Page contains sample text to simulate OCR output."

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment