Implementation:Sgl project Sglang Init Distributed Environment
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Computing, GPU_Parallelism |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Concrete tool for initializing PyTorch distributed process groups for multi-GPU model serving and quantization.
Description
The init_distributed_environment function initializes torch.distributed with the specified backend, world size, and rank. It creates the global process group and sets the local rank for GPU assignment. It supports NCCL (GPU), Gloo (CPU), and Mooncake backends. If torch.distributed is already initialized, it validates the existing configuration.
Usage
Call init_distributed_environment at the beginning of standalone multi-GPU scripts (e.g., ModelOpt quantization). For standard Engine/Server usage, this is called automatically.
Code Reference
Source Location
- Repository: sglang
- File: python/sglang/srt/distributed/parallel_state.py
- Lines: L1491-1555
Signature
def init_distributed_environment(
world_size: int = -1,
rank: int = -1,
distributed_init_method: str = "env://",
local_rank: int = -1,
backend: str = "nccl",
timeout: Optional[int] = None,
) -> None:
"""Initialize distributed environment for multi-GPU execution."""
Import
from sglang.srt.distributed.parallel_state import (
init_distributed_environment,
initialize_model_parallel,
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| world_size | int | No | Total process count (-1 = auto from env) |
| rank | int | No | Process rank (-1 = auto from env) |
| distributed_init_method | str | No | Init method (default: "env://") |
| local_rank | int | No | Local GPU rank (-1 = auto) |
| backend | str | No | Communication backend (default: "nccl") |
| timeout | Optional[int] | No | Timeout in seconds |
Outputs
| Name | Type | Description |
|---|---|---|
| (none) | None | Initialized torch.distributed process group; global _WORLD set |
Usage Examples
ModelOpt Quantization Script
import torch
from sglang.srt.distributed.parallel_state import (
init_distributed_environment,
initialize_model_parallel,
)
# Initialize for single-GPU quantization
init_distributed_environment(
world_size=1,
rank=0,
distributed_init_method="tcp://127.0.0.1:12345",
local_rank=0,
backend="nccl",
)
initialize_model_parallel(tensor_model_parallel_size=1)
# Now proceed with model loading and quantization...
Multi-GPU Setup
# Typically launched via torchrun:
# torchrun --nproc_per_node=4 script.py
init_distributed_environment(
world_size=4,
rank=int(os.environ["RANK"]),
local_rank=int(os.environ["LOCAL_RANK"]),
backend="nccl",
)
initialize_model_parallel(tensor_model_parallel_size=4)