Workflow:Sgl project Sglang Frontend Language Multi Turn Chat
| Knowledge Sources | |
|---|---|
| Domains | LLM_Inference, Prompt_Engineering, Frontend_DSL |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
End-to-end process for building multi-turn conversational AI programs using SGLang's frontend domain-specific language (DSL) with features like branching, constrained generation, and batching.
Description
This workflow covers using SGLang's Python-native frontend language to construct complex generation programs that go beyond simple prompt-in, text-out patterns. The frontend DSL provides primitives for multi-turn conversations (sgl.user, sgl.assistant), parallel generation branches (fork/join), constrained decoding (choices, regex), streaming output, and batch execution. Programs are defined as decorated Python functions that compose these primitives, enabling sophisticated generation logic with automatic prompt caching and efficient execution.
Usage
Execute this workflow when you need to build generation programs with control flow, multi-turn dialogues, branching logic, or constrained outputs that go beyond simple API calls. Common use cases include multi-step reasoning agents, structured information extraction pipelines, tool-use decision making, and parallel hypothesis generation.
Execution Steps
Step 1: Initialize the SGLang Backend
Set up either a local Runtime (in-process model) or connect to a remote RuntimeEndpoint (running SGLang server). The backend handles all model execution while the frontend DSL defines the generation logic.
Key considerations:
- sgl.Runtime(model_path=...) for local in-process execution
- sgl.RuntimeEndpoint("http://host:port") for connecting to a running server
- Set as default backend with sgl.set_default_backend()
- Local runtime manages its own GPU memory and model lifecycle
Step 2: Define Generation Functions
Write Python functions decorated with @sgl.function that define the generation program. Within these functions, use the SGLang primitives to build multi-turn conversations, branch into parallel generation paths, and apply constraints to outputs.
Key considerations:
- @sgl.function decorator enables SGLang state tracking
- Append text with s += "text" to build the prompt
- Use sgl.user() and sgl.assistant() for chat-formatted conversations
- Use sgl.gen("name") to generate and capture output in a named variable
- Use sgl.gen("name", choices=[...]) for constrained selection
- Use sgl.gen("name", regex=pattern) for regex-constrained generation
Step 3: Implement Branching Logic
For programs requiring parallel exploration, use the fork primitive to create multiple generation branches that share the same prefix. Each branch generates independently and results can be joined back into the main execution flow.
Key considerations:
- s.fork(n) creates n parallel branches sharing the prefix cache
- Each fork can generate with different continuations
- Branches execute in parallel leveraging RadixAttention prefix sharing
- Results from forks are accessed by indexing the fork variable
Step 4: Execute the Generation Program
Run the defined function with input arguments using .run() for single execution, .run_batch() for batch execution, or with stream=True for streaming output. The runtime handles prompt caching, batching, and efficient execution.
Key considerations:
- .run(arg1=val1, ...) for single execution returning a ProgramState
- .run_batch([dict1, dict2, ...]) for batch execution returning list of states
- stream=True enables token-by-token streaming via text_iter()
- progress_bar=True shows batch execution progress
Step 5: Extract Results
Access generated outputs from the returned ProgramState object. Named generation variables are accessible by indexing the state, full text is available via state.text(), and chat messages via state.messages().
Key considerations:
- state["variable_name"] retrieves a specific generated output
- state.text() returns the full concatenated program text
- state.messages() returns the conversation in message format
- Multiple named generations can be accessed independently