Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Datatrove CustomInferenceServer

From Leeroopedia
Knowledge Sources
Domains Machine Learning Inference, Server Management, Data Processing
Last Updated 2026-02-14 17:00 GMT

Overview

CustomServer is an inference server implementation that launches a user-provided Python script as a subprocess, enabling integration of arbitrary inference backends into the Datatrove inference pipeline.

Description

CustomServer extends InferenceServer to support custom inference backends by launching an external Python script as a subprocess. The script path is provided via the server_script key in the model_kwargs dictionary of the InferenceConfig. At initialization, the server script path is extracted from model_kwargs and removed from the dictionary so that remaining kwargs can be passed as command-line arguments to the script.

The start_server method constructs a command that invokes the server script using the current Python interpreter (sys.executable) with a --port argument set to the assigned port number. Additional model_kwargs are appended as command-line arguments: boolean True values become flag arguments (e.g., --enforce-eager), boolean False values become negated flags (e.g., --no-enforce-eager), and other values become key=value arguments (e.g., --max-tokens=1024). The subprocess is created asynchronously with piped stdout and stderr.

The monitor_health method reads the subprocess output streams asynchronously, looking for startup completion indicators ("Application startup complete" or "Uvicorn running on") and error conditions (CUDA out of memory, CUDA runtime errors, import errors). All output is logged, and errors detected in the server output raise RuntimeError exceptions that are caught by the base class monitoring loop. The method uses asyncio tasks to read both stdout and stderr concurrently, along with a task that waits for the process to exit, and handles cleanup by cancelling all tasks when any one completes or raises an exception.

Usage

Use CustomServer when you need to integrate a custom inference backend that is packaged as a standalone Python HTTP server script (e.g., a FastAPI or Uvicorn application). The script must accept a --port argument and expose an OpenAI-compatible API.

Code Reference

Source Location

  • Repository: Huggingface_Datatrove
  • File: src/datatrove/pipeline/inference/servers/custom_server.py
  • Lines: 1-116

Signature

class CustomServer(InferenceServer):
    def __init__(self, config: "InferenceConfig", rank: int):
        ...

    async def start_server(self) -> asyncio.subprocess.Process | None:
        ...

    async def monitor_health(self):
        ...

Import

from datatrove.pipeline.inference.servers.custom_server import CustomServer

I/O Contract

Inputs

Name Type Required Description
config InferenceConfig Yes Configuration object; must have "server_script" in model_kwargs
rank int Yes Rank identifier for this server instance

Outputs

Name Type Description
process asyncio.subprocess.Process or None The launched subprocess running the custom server script

Usage Examples

Basic Usage

from datatrove.pipeline.inference.run_inference import InferenceConfig

# Configure with a custom server script
config = InferenceConfig(
    model="my-model",
    model_kwargs={
        "server_script": "/path/to/my_server.py",
        "max_tokens": 1024,
        "enforce_eager": True,
    },
)

# The CustomServer is typically instantiated by the inference runner
# based on the config's server_type setting

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment