Implementation:Sgl project Sglang ServerArgs Init
| Knowledge Sources | |
|---|---|
| Domains | LLM_Serving, Configuration |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Concrete tool for configuring all SGLang inference server parameters provided by the SGLang runtime.
Description
The ServerArgs class is a Python dataclass that holds ~200 configuration parameters for the SGLang inference server. It is the single source of truth for model path, tensor parallelism, memory allocation, quantization, scheduling, logging, and all other server settings. It supports construction via keyword arguments (programmatic) or CLI argument parsing.
Usage
Import and instantiate ServerArgs when you need fine-grained control over server configuration before passing it to Engine or launch_server. For simple cases, Engine accepts kwargs directly (which internally constructs a ServerArgs).
Code Reference
Source Location
- Repository: sglang
- File: python/sglang/srt/server_args.py
- Lines: L273-698 (class definition)
Signature
@dataclasses.dataclass
class ServerArgs:
# Model and tokenizer
model_path: str
tokenizer_path: Optional[str] = None
tokenizer_mode: str = "auto"
load_format: str = "auto"
trust_remote_code: bool = False
context_length: Optional[int] = None
# HTTP server
host: str = "127.0.0.1"
port: int = 30000
# Quantization and data type
dtype: str = "auto"
quantization: Optional[str] = None
kv_cache_dtype: str = "auto"
# Memory and scheduling
mem_fraction_static: Optional[float] = None
max_running_requests: Optional[int] = None
chunked_prefill_size: Optional[int] = None
schedule_policy: str = "fcfs"
# Runtime options
tp_size: int = 1
pp_size: int = 1
stream_interval: int = 1
random_seed: Optional[int] = None
# Logging
log_level: str = "info"
log_requests: bool = False
# ... (~200 fields total)
Import
from sglang.srt.server_args import ServerArgs
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_path | str | Yes | HuggingFace model ID or local path |
| tp_size | int | No | Tensor parallelism degree (default: 1) |
| dtype | str | No | Weight data type — "auto", "float16", "bfloat16", "float32" (default: "auto") |
| quantization | Optional[str] | No | Quantization method — "awq", "gptq", "fp8", "modelopt", etc. |
| mem_fraction_static | Optional[float] | No | GPU memory fraction for KV cache (default: auto-calculated) |
| context_length | Optional[int] | No | Override model's default context length |
Outputs
| Name | Type | Description |
|---|---|---|
| ServerArgs instance | ServerArgs | Validated dataclass with all server configuration parameters |
Usage Examples
Programmatic Construction
from sglang.srt.server_args import ServerArgs
# Create server args for a multi-GPU setup
server_args = ServerArgs(
model_path="meta-llama/Llama-3.1-8B-Instruct",
tp_size=2,
dtype="bfloat16",
mem_fraction_static=0.85,
context_length=4096,
port=30000,
)
CLI Parsing
from sglang.srt.server_args import ServerArgs, prepare_server_args
# Parse from command-line arguments
server_args = prepare_server_args(sys.argv[1:])
# Equivalent to: python -m sglang.launch_server --model-path meta-llama/... --tp-size 2
Related Pages
Implements Principle
Requires Environment
- Environment:Sgl_project_Sglang_CUDA_GPU_Runtime
- Environment:Sgl_project_Sglang_Python_Dependencies
- Environment:Sgl_project_Sglang_Multi_Platform_Accelerators