Implementation:Sgl project Sglang Get Model Loader

Knowledge Sources	SGLang
Domains	Quantization, Model_Loading, Model_Optimization
Last Updated	2026-02-10 00:00 GMT

Overview

Concrete tool for selecting and executing the appropriate model loader for standard or quantized model loading in SGLang.

Description

The get_model_loader factory function inspects LoadConfig and ModelConfig to determine the appropriate loader class. For ModelOpt quantization (modelopt_fp8, modelopt_fp4), it returns a ModelOptModelLoader that handles the complete quantize-export pipeline. The ModelOptModelLoader.load_model method loads the base model, applies NVIDIA ModelOpt quantization, and optionally exports the quantized result.

Usage

Call get_model_loader to obtain the correct loader, then call loader.load_model to execute loading and optional quantization. This is primarily used in standalone quantization scripts.

Code Reference

Source Location

Repository: sglang
File: python/sglang/srt/model_loader/loader.py
Lines: L2713-2742 (get_model_loader), L2614-2637 (ModelOptModelLoader.load_model)

Signature

def get_model_loader(
    load_config: LoadConfig,
    model_config: Optional[ModelConfig] = None,
) -> BaseModelLoader:
    """Get a model loader based on the load format and quantization config."""

class ModelOptModelLoader(DefaultModelLoader):
    def load_model(
        self,
        *,
        model_config: ModelConfig,
        device_config: DeviceConfig,
    ) -> nn.Module:
        """Load and optionally quantize model using NVIDIA ModelOpt."""

Import

from sglang.srt.model_loader.loader import get_model_loader
from sglang.srt.configs.load_config import LoadConfig
from sglang.srt.configs.model_config import ModelConfig

I/O Contract

Inputs

Name	Type	Required	Description
load_config	LoadConfig	Yes	Loading format and export path configuration
model_config	Optional[ModelConfig]	No	Model configuration with quantization settings
device_config	DeviceConfig	Yes (for load_model)	Target device for model placement

Outputs

Name	Type	Description
model_loader	BaseModelLoader	Selected model loader instance
model	nn.Module	Loaded (and optionally quantized) model (from load_model)

Usage Examples

Quantize and Export

from sglang.srt.model_loader.loader import get_model_loader
from sglang.srt.configs.model_config import ModelConfig
from sglang.srt.configs.load_config import LoadConfig

model_config = ModelConfig(
    model_path="meta-llama/Llama-3.1-8B-Instruct",
    quantization="modelopt_fp8",
)
load_config = LoadConfig(
    modelopt_export_path="/tmp/quantized_model",
)

# Factory selects ModelOptModelLoader
loader = get_model_loader(load_config, model_config)

# Load, quantize, and export
model = loader.load_model(
    model_config=model_config,
    device_config=device_config,
)

Related Pages

Implements Principle

Principle:Sgl_project_Sglang_Model_Quantization_And_Loading

Requires Environment

Environment:Sgl_project_Sglang_CUDA_GPU_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment