Principle:Sgl project Sglang Parallel Branching Logic
| Knowledge Sources | |
|---|---|
| Domains | Frontend_DSL, Parallel_Computing, LLM_Programming |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
A program branching primitive that creates multiple parallel execution paths sharing a common prefix for efficient multi-sample generation.
Description
Parallel branching (fork/join) allows an SGLang program to split execution into multiple parallel branches that share the same KV cache prefix. Each branch can independently generate different content, and the results are collected back via a join operation. This is efficient because the shared prefix is computed once and reused across all branches via RadixAttention's prefix caching. Common use cases include generating multiple candidate answers, tree-of-thought reasoning, and best-of-N sampling.
Usage
Use fork/join when you need multiple independent generations from the same context — parallel sampling, best-of-N selection, or exploring different reasoning paths.
Theoretical Basis
Fork/join leverages prefix sharing for efficiency:
- Fork: Create N branches sharing the KV cache of the current prefix
- Independent execution: Each branch generates independently
- Join: Collect variables from all branches back to the parent
Efficiency gain: Without forking, N parallel samples would each recompute the full prefix. With forking, the prefix is computed once and shared via the radix tree KV cache.
Join modes:
- gather_variable — Collect named variables from all branches into lists
- concate_and_append — Concatenate branch KV caches (for continued generation)