Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Sgl project Sglang ServerArgs Init

From Leeroopedia


Knowledge Sources
Domains LLM_Serving, Configuration
Last Updated 2026-02-10 00:00 GMT

Overview

Concrete tool for configuring all SGLang inference server parameters provided by the SGLang runtime.

Description

The ServerArgs class is a Python dataclass that holds ~200 configuration parameters for the SGLang inference server. It is the single source of truth for model path, tensor parallelism, memory allocation, quantization, scheduling, logging, and all other server settings. It supports construction via keyword arguments (programmatic) or CLI argument parsing.

Usage

Import and instantiate ServerArgs when you need fine-grained control over server configuration before passing it to Engine or launch_server. For simple cases, Engine accepts kwargs directly (which internally constructs a ServerArgs).

Code Reference

Source Location

  • Repository: sglang
  • File: python/sglang/srt/server_args.py
  • Lines: L273-698 (class definition)

Signature

@dataclasses.dataclass
class ServerArgs:
    # Model and tokenizer
    model_path: str
    tokenizer_path: Optional[str] = None
    tokenizer_mode: str = "auto"
    load_format: str = "auto"
    trust_remote_code: bool = False
    context_length: Optional[int] = None

    # HTTP server
    host: str = "127.0.0.1"
    port: int = 30000

    # Quantization and data type
    dtype: str = "auto"
    quantization: Optional[str] = None
    kv_cache_dtype: str = "auto"

    # Memory and scheduling
    mem_fraction_static: Optional[float] = None
    max_running_requests: Optional[int] = None
    chunked_prefill_size: Optional[int] = None
    schedule_policy: str = "fcfs"

    # Runtime options
    tp_size: int = 1
    pp_size: int = 1
    stream_interval: int = 1
    random_seed: Optional[int] = None

    # Logging
    log_level: str = "info"
    log_requests: bool = False
    # ... (~200 fields total)

Import

from sglang.srt.server_args import ServerArgs

I/O Contract

Inputs

Name Type Required Description
model_path str Yes HuggingFace model ID or local path
tp_size int No Tensor parallelism degree (default: 1)
dtype str No Weight data type — "auto", "float16", "bfloat16", "float32" (default: "auto")
quantization Optional[str] No Quantization method — "awq", "gptq", "fp8", "modelopt", etc.
mem_fraction_static Optional[float] No GPU memory fraction for KV cache (default: auto-calculated)
context_length Optional[int] No Override model's default context length

Outputs

Name Type Description
ServerArgs instance ServerArgs Validated dataclass with all server configuration parameters

Usage Examples

Programmatic Construction

from sglang.srt.server_args import ServerArgs

# Create server args for a multi-GPU setup
server_args = ServerArgs(
    model_path="meta-llama/Llama-3.1-8B-Instruct",
    tp_size=2,
    dtype="bfloat16",
    mem_fraction_static=0.85,
    context_length=4096,
    port=30000,
)

CLI Parsing

from sglang.srt.server_args import ServerArgs, prepare_server_args

# Parse from command-line arguments
server_args = prepare_server_args(sys.argv[1:])
# Equivalent to: python -m sglang.launch_server --model-path meta-llama/... --tp-size 2

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment