Implementation:Sgl project Sglang Engine Generate

Knowledge Sources	SGLang
Domains	LLM_Serving, Text_Generation, Inference
Last Updated	2026-02-10 00:00 GMT

Overview

Concrete tool for executing synchronous and asynchronous text generation requests provided by the SGLang Engine.

Description

The Engine.generate method accepts single or batched prompts (as text or token IDs) along with sampling parameters, and returns generated text with metadata. It wraps the internal TokenizerManager.generate_request coroutine in a synchronous interface. For async workflows, Engine.async_generate provides the native async version. Both support streaming via the stream=True parameter.

Usage

Call Engine.generate for synchronous batch inference. Use Engine.async_generate when integrating with async frameworks or when you need concurrent request processing within a single process.

Code Reference

Source Location

Repository: sglang
File: python/sglang/srt/entrypoints/engine.py
Lines: L205-293 (generate), L295-373 (async_generate)

Signature

def generate(
    self,
    prompt: Optional[Union[List[str], str]] = None,
    sampling_params: Optional[Union[List[Dict], Dict]] = None,
    input_ids: Optional[Union[List[List[int]], List[int]]] = None,
    image_data: Optional[MultimodalDataInputFormat] = None,
    audio_data: Optional[MultimodalDataInputFormat] = None,
    video_data: Optional[MultimodalDataInputFormat] = None,
    return_logprob: Optional[Union[List[bool], bool]] = False,
    logprob_start_len: Optional[Union[List[int], int]] = None,
    top_logprobs_num: Optional[Union[List[int], int]] = None,
    lora_path: Optional[List[Optional[str]]] = None,
    stream: bool = False,
    rid: Optional[Union[List[str], str]] = None,
) -> Union[Dict, Iterator[Dict]]:
    """Execute text generation. Returns dict or streaming iterator."""

Import

import sglang as sgl

engine = sgl.Engine(model_path="...")
result = engine.generate(prompt, sampling_params)

I/O Contract

Inputs

Name	Type	Required	Description
prompt	Optional[Union[List[str], str]]	Yes (or input_ids)	Text prompt(s) for generation
sampling_params	Optional[Union[List[Dict], Dict]]	No	Sampling configuration dictionary
input_ids	Optional[Union[List[List[int]], List[int]]]	No	Token IDs (alternative to text prompt)
image_data	Optional[MultimodalDataInputFormat]	No	Image inputs for VLMs
stream	bool	No	Enable streaming output (default: False)
return_logprob	Optional[Union[List[bool], bool]]	No	Return log probabilities (default: False)

Outputs

Name	Type	Description
result	Dict	Keys: "text" (generated text), "meta_info" (metadata), "input_token_num", "output_token_num"
stream result	Iterator[Dict]	When stream=True, yields partial result dicts

Usage Examples

Single Prompt

import sglang as sgl

engine = sgl.Engine(model_path="meta-llama/Llama-3.1-8B-Instruct")

output = engine.generate(
    prompt="The capital of France is",
    sampling_params={"temperature": 0, "max_new_tokens": 32},
)
print(output["text"])

Batch Prompts

prompts = [
    "What is machine learning?",
    "Explain the theory of relativity.",
    "Write a haiku about programming.",
]
sampling_params = {"temperature": 0.7, "max_new_tokens": 128}

outputs = engine.generate(prompts, sampling_params)
for i, out in enumerate(outputs):
    print(f"Prompt {i}: {out['text']}")

Streaming

for chunk in engine.generate(
    "Tell me a story",
    {"max_new_tokens": 256, "temperature": 0.8},
    stream=True,
):
    print(chunk["text"], end="", flush=True)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment