Implementation:Huggingface Datatrove CustomInferenceServer
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning Inference, Server Management, Data Processing |
| Last Updated | 2026-02-14 17:00 GMT |
Overview
CustomServer is an inference server implementation that launches a user-provided Python script as a subprocess, enabling integration of arbitrary inference backends into the Datatrove inference pipeline.
Description
CustomServer extends InferenceServer to support custom inference backends by launching an external Python script as a subprocess. The script path is provided via the server_script key in the model_kwargs dictionary of the InferenceConfig. At initialization, the server script path is extracted from model_kwargs and removed from the dictionary so that remaining kwargs can be passed as command-line arguments to the script.
The start_server method constructs a command that invokes the server script using the current Python interpreter (sys.executable) with a --port argument set to the assigned port number. Additional model_kwargs are appended as command-line arguments: boolean True values become flag arguments (e.g., --enforce-eager), boolean False values become negated flags (e.g., --no-enforce-eager), and other values become key=value arguments (e.g., --max-tokens=1024). The subprocess is created asynchronously with piped stdout and stderr.
The monitor_health method reads the subprocess output streams asynchronously, looking for startup completion indicators ("Application startup complete" or "Uvicorn running on") and error conditions (CUDA out of memory, CUDA runtime errors, import errors). All output is logged, and errors detected in the server output raise RuntimeError exceptions that are caught by the base class monitoring loop. The method uses asyncio tasks to read both stdout and stderr concurrently, along with a task that waits for the process to exit, and handles cleanup by cancelling all tasks when any one completes or raises an exception.
Usage
Use CustomServer when you need to integrate a custom inference backend that is packaged as a standalone Python HTTP server script (e.g., a FastAPI or Uvicorn application). The script must accept a --port argument and expose an OpenAI-compatible API.
Code Reference
Source Location
- Repository: Huggingface_Datatrove
- File: src/datatrove/pipeline/inference/servers/custom_server.py
- Lines: 1-116
Signature
class CustomServer(InferenceServer):
def __init__(self, config: "InferenceConfig", rank: int):
...
async def start_server(self) -> asyncio.subprocess.Process | None:
...
async def monitor_health(self):
...
Import
from datatrove.pipeline.inference.servers.custom_server import CustomServer
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | InferenceConfig | Yes | Configuration object; must have "server_script" in model_kwargs |
| rank | int | Yes | Rank identifier for this server instance |
Outputs
| Name | Type | Description |
|---|---|---|
| process | asyncio.subprocess.Process or None | The launched subprocess running the custom server script |
Usage Examples
Basic Usage
from datatrove.pipeline.inference.run_inference import InferenceConfig
# Configure with a custom server script
config = InferenceConfig(
model="my-model",
model_kwargs={
"server_script": "/path/to/my_server.py",
"max_tokens": 1024,
"enforce_eager": True,
},
)
# The CustomServer is typically instantiated by the inference runner
# based on the config's server_type setting