Principle:Sgl project Sglang Generation Output Processing
| Knowledge Sources | |
|---|---|
| Domains | LLM_Serving, Data_Processing |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
A data extraction pattern for accessing generated text and metadata from inference engine output dictionaries.
Description
After text generation completes, results are returned as structured dictionaries containing the generated text, token counts, finish reason, and optional metadata like log probabilities. Processing these outputs involves extracting the relevant fields for downstream use — whether that is displaying text to users, feeding into evaluation pipelines, or storing in datasets. The output format is consistent across single and batch generation modes.
Usage
Process generation outputs after every call to Engine.generate or the OpenAI-compatible API. The output dict pattern is the standard way to access results in SGLang offline inference.
Theoretical Basis
The output follows a structured dictionary pattern:
Pseudo-code:
# Abstract output structure
output = {
"text": str, # Generated text (single) or List[str] (batch)
"meta_info": dict, # Metadata: finish_reason, token counts
"input_token_num": int,
"output_token_num": int,
}
For batch generation, the output is either a list of dicts (one per prompt) or a single dict with list values, depending on the API used.