Principle:Sgl project Sglang Generation Program Execution
| Knowledge Sources | |
|---|---|
| Domains | Frontend_DSL, Execution, LLM_Programming |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
An execution mechanism that runs SGLang generation programs against a backend, supporting single execution, batch execution, and streaming modes.
Description
Generation program execution takes a defined @sgl.function and runs it against the configured backend. The .run() method executes a single program instance with the given arguments and returns a ProgramState containing all generated variables and the full conversation text. The .run_batch() method executes multiple instances in parallel with thread pooling for throughput optimization. Both methods accept default sampling parameters that apply to all sgl.gen() calls within the program.
Usage
Use .run() for individual program execution and .run_batch() for processing multiple inputs efficiently. Streaming mode (stream=True) enables real-time output for interactive applications.
Theoretical Basis
Execution follows a compile-and-dispatch pattern:
- Compile: The @sgl.function body is traced to build an execution graph
- Dispatch: The graph is sent to the backend for execution
- Collect: Results (generated text, variables) are collected into ProgramState
Batch execution uses thread pools to maximize throughput:
- Each program instance runs in its own thread
- The backend handles request batching at the server level
- Thread count is auto-tuned based on batch size (num_threads="auto")