Implementation:Sgl project Sglang ServerArgs Init

Knowledge Sources	SGLang
Domains	LLM_Serving, Configuration
Last Updated	2026-02-10 00:00 GMT

Overview

Concrete tool for configuring all SGLang inference server parameters provided by the SGLang runtime.

Description

The ServerArgs class is a Python dataclass that holds ~200 configuration parameters for the SGLang inference server. It is the single source of truth for model path, tensor parallelism, memory allocation, quantization, scheduling, logging, and all other server settings. It supports construction via keyword arguments (programmatic) or CLI argument parsing.

Usage

Import and instantiate ServerArgs when you need fine-grained control over server configuration before passing it to Engine or launch_server. For simple cases, Engine accepts kwargs directly (which internally constructs a ServerArgs).

Code Reference

Source Location

Repository: sglang
File: python/sglang/srt/server_args.py
Lines: L273-698 (class definition)

Signature

@dataclasses.dataclass
class ServerArgs:
    # Model and tokenizer
    model_path: str
    tokenizer_path: Optional[str] = None
    tokenizer_mode: str = "auto"
    load_format: str = "auto"
    trust_remote_code: bool = False
    context_length: Optional[int] = None

    # HTTP server
    host: str = "127.0.0.1"
    port: int = 30000

    # Quantization and data type
    dtype: str = "auto"
    quantization: Optional[str] = None
    kv_cache_dtype: str = "auto"

    # Memory and scheduling
    mem_fraction_static: Optional[float] = None
    max_running_requests: Optional[int] = None
    chunked_prefill_size: Optional[int] = None
    schedule_policy: str = "fcfs"

    # Runtime options
    tp_size: int = 1
    pp_size: int = 1
    stream_interval: int = 1
    random_seed: Optional[int] = None

    # Logging
    log_level: str = "info"
    log_requests: bool = False
    # ... (~200 fields total)

Import

from sglang.srt.server_args import ServerArgs

I/O Contract

Inputs

Name	Type	Required	Description
model_path	str	Yes	HuggingFace model ID or local path
tp_size	int	No	Tensor parallelism degree (default: 1)
dtype	str	No	Weight data type — "auto", "float16", "bfloat16", "float32" (default: "auto")
quantization	Optional[str]	No	Quantization method — "awq", "gptq", "fp8", "modelopt", etc.
mem_fraction_static	Optional[float]	No	GPU memory fraction for KV cache (default: auto-calculated)
context_length	Optional[int]	No	Override model's default context length

Outputs

Name	Type	Description
ServerArgs instance	ServerArgs	Validated dataclass with all server configuration parameters

Usage Examples

Programmatic Construction

from sglang.srt.server_args import ServerArgs

# Create server args for a multi-GPU setup
server_args = ServerArgs(
    model_path="meta-llama/Llama-3.1-8B-Instruct",
    tp_size=2,
    dtype="bfloat16",
    mem_fraction_static=0.85,
    context_length=4096,
    port=30000,
)

CLI Parsing

from sglang.srt.server_args import ServerArgs, prepare_server_args

# Parse from command-line arguments
server_args = prepare_server_args(sys.argv[1:])
# Equivalent to: python -m sglang.launch_server --model-path meta-llama/... --tp-size 2

Related Pages

Implements Principle

Principle:Sgl_project_Sglang_Server_Arguments_Configuration

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment