Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Sgl project Sglang Generation Output Dict

From Leeroopedia


Knowledge Sources
Domains LLM_Serving, Data_Processing
Last Updated 2026-02-10 00:00 GMT

Overview

Concrete pattern for extracting generated text and metadata from SGLang Engine output dictionaries.

Description

The output from Engine.generate is a Python dict (single prompt) or list of dicts (batch). The primary field is "text" containing the generated completion. Additional metadata includes token counts and finish reasons. This is a pattern (not a class) — users access fields via standard dict key indexing.

Usage

Access the "text" key from the output dict to retrieve generated content. Use "meta_info" for debugging or monitoring token usage and finish reasons.

Code Reference

Source Location

  • Repository: sglang
  • File: python/sglang/srt/managers/io_struct.py (output format definition)

Interface Specification

# Output dict structure from Engine.generate
result: Dict = {
    "text": str,                    # Generated text
    "meta_info": {
        "finish_reason": {
            "type": str,            # "stop" or "length"
            "matched": Optional[int],
        },
        "completion_tokens": int,
        "prompt_tokens": int,
    },
    "input_token_num": int,
    "output_token_num": int,
}

# Access pattern
generated_text = result["text"]

I/O Contract

Inputs

Name Type Required Description
result Dict Yes Output dict from Engine.generate

Outputs

Name Type Description
text str Generated text completion
meta_info Dict Metadata (finish_reason, token counts)
input_token_num int Number of input tokens processed
output_token_num int Number of output tokens generated

Usage Examples

Basic Text Extraction

output = engine.generate("What is 2+2?", {"temperature": 0, "max_new_tokens": 16})

# Extract generated text
answer = output["text"]
print(f"Answer: {answer}")

# Check token usage
print(f"Input tokens: {output['input_token_num']}")
print(f"Output tokens: {output['output_token_num']}")

Batch Output Processing

prompts = ["Hello", "World", "Test"]
outputs = engine.generate(prompts, {"max_new_tokens": 64})

for i, out in enumerate(outputs):
    print(f"Prompt {i}: {out['text'][:50]}...")
    print(f"  Finish reason: {out['meta_info']['finish_reason']['type']}")

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment