Implementation:Huggingface Datatrove CustomInferenceServer

Knowledge Sources	Huggingface_Datatrove
Domains	Machine Learning Inference, Server Management, Data Processing
Last Updated	2026-02-14 17:00 GMT

Overview

CustomServer is an inference server implementation that launches a user-provided Python script as a subprocess, enabling integration of arbitrary inference backends into the Datatrove inference pipeline.

Description

CustomServer extends InferenceServer to support custom inference backends by launching an external Python script as a subprocess. The script path is provided via the server_script key in the model_kwargs dictionary of the InferenceConfig. At initialization, the server script path is extracted from model_kwargs and removed from the dictionary so that remaining kwargs can be passed as command-line arguments to the script.

The start_server method constructs a command that invokes the server script using the current Python interpreter (sys.executable) with a --port argument set to the assigned port number. Additional model_kwargs are appended as command-line arguments: boolean True values become flag arguments (e.g., --enforce-eager), boolean False values become negated flags (e.g., --no-enforce-eager), and other values become key=value arguments (e.g., --max-tokens=1024). The subprocess is created asynchronously with piped stdout and stderr.

The monitor_health method reads the subprocess output streams asynchronously, looking for startup completion indicators ("Application startup complete" or "Uvicorn running on") and error conditions (CUDA out of memory, CUDA runtime errors, import errors). All output is logged, and errors detected in the server output raise RuntimeError exceptions that are caught by the base class monitoring loop. The method uses asyncio tasks to read both stdout and stderr concurrently, along with a task that waits for the process to exit, and handles cleanup by cancelling all tasks when any one completes or raises an exception.

Usage

Use CustomServer when you need to integrate a custom inference backend that is packaged as a standalone Python HTTP server script (e.g., a FastAPI or Uvicorn application). The script must accept a --port argument and expose an OpenAI-compatible API.

Code Reference

Source Location

Repository: Huggingface_Datatrove
File: src/datatrove/pipeline/inference/servers/custom_server.py
Lines: 1-116

Signature

class CustomServer(InferenceServer):
    def __init__(self, config: "InferenceConfig", rank: int):
        ...

    async def start_server(self) -> asyncio.subprocess.Process | None:
        ...

    async def monitor_health(self):
        ...

Import

from datatrove.pipeline.inference.servers.custom_server import CustomServer

I/O Contract

Inputs

Name	Type	Required	Description
config	InferenceConfig	Yes	Configuration object; must have "server_script" in model_kwargs
rank	int	Yes	Rank identifier for this server instance

Outputs

Name	Type	Description
process	asyncio.subprocess.Process or None	The launched subprocess running the custom server script

Usage Examples

Basic Usage

from datatrove.pipeline.inference.run_inference import InferenceConfig

# Configure with a custom server script
config = InferenceConfig(
    model="my-model",
    model_kwargs={
        "server_script": "/path/to/my_server.py",
        "max_tokens": 1024,
        "enforce_eager": True,
    },
)

# The CustomServer is typically instantiated by the inference runner
# based on the config's server_type setting

Related Pages

Principle:Huggingface_Datatrove_Inference_Server_Management

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment