Implementation:Sgl project Sglang Engine Init

Knowledge Sources	SGLang
Domains	LLM_Serving, Inference_Engine
Last Updated	2026-02-10 00:00 GMT

Overview

Concrete tool for initializing the SGLang inference engine with multi-process architecture provided by the SGLang runtime.

Description

The Engine class is the main entry point for programmatic LLM inference in SGLang. On initialization, it spawns TokenizerManager, Scheduler, and DetokenizerManager subprocesses, sets up ZMQ IPC communication, and registers automatic shutdown via atexit. It accepts either a ServerArgs object directly or keyword arguments that mirror ServerArgs fields.

Usage

Import Engine (or use sgl.Engine) when performing offline batch inference, embedding computation, or any programmatic model interaction without an HTTP server.

Code Reference

Source Location

Repository: sglang
File: python/sglang/srt/entrypoints/engine.py
Lines: L118-204

Signature

class Engine(EngineBase):
    def __init__(self, **kwargs):
        """
        Args mirror ServerArgs fields. Key parameters:
            model_path (str): HuggingFace model ID or local path.
            log_level (str): Logging level (default: "error" for Engine).
            server_args (ServerArgs): Direct ServerArgs object (alternative to kwargs).
            tp_size (int): Tensor parallelism degree.
            dtype (str): Weight data type.
            quantization (Optional[str]): Quantization method.
            mem_fraction_static (Optional[float]): GPU memory fraction for KV cache.
        """

Import

import sglang as sgl

# Or directly:
from sglang.srt.entrypoints.engine import Engine

I/O Contract

Inputs

Name	Type	Required	Description
model_path	str	Yes (via kwargs or server_args)	HuggingFace model ID or local path
server_args	ServerArgs	No	Pre-constructed ServerArgs (alternative to kwargs)
log_level	str	No	Logging level (default: "error")
tp_size	int	No	Tensor parallelism degree (default: 1)
dtype	str	No	Weight data type (default: "auto")

Outputs

Name	Type	Description
Engine instance	Engine	Initialized engine with running subprocesses (TokenizerManager, Scheduler, DetokenizerManager)

Usage Examples

Basic Initialization

import sglang as sgl

# Initialize with kwargs (simplest form)
engine = sgl.Engine(model_path="meta-llama/Llama-3.1-8B-Instruct")

# Use the engine for generation...
output = engine.generate("What is AI?", {"max_new_tokens": 64})

# Shutdown when done
engine.shutdown()

Context Manager

import sglang as sgl

# Engine supports context manager for automatic shutdown
with sgl.Engine(model_path="meta-llama/Llama-3.1-8B-Instruct", tp_size=2) as engine:
    output = engine.generate("Explain quantum computing.", {"max_new_tokens": 128})
    print(output["text"])
# Engine is automatically shut down here

With Pre-Built ServerArgs

from sglang.srt.server_args import ServerArgs
from sglang.srt.entrypoints.engine import Engine

server_args = ServerArgs(
    model_path="meta-llama/Llama-3.1-8B-Instruct",
    tp_size=4,
    dtype="bfloat16",
    mem_fraction_static=0.9,
)
engine = Engine(server_args=server_args)

Related Pages

Implements Principle

Principle:Sgl_project_Sglang_Engine_Initialization

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment