Implementation:Sgl project Sglang Engine Generate
| Knowledge Sources | |
|---|---|
| Domains | LLM_Serving, Text_Generation, Inference |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Concrete tool for executing synchronous and asynchronous text generation requests provided by the SGLang Engine.
Description
The Engine.generate method accepts single or batched prompts (as text or token IDs) along with sampling parameters, and returns generated text with metadata. It wraps the internal TokenizerManager.generate_request coroutine in a synchronous interface. For async workflows, Engine.async_generate provides the native async version. Both support streaming via the stream=True parameter.
Usage
Call Engine.generate for synchronous batch inference. Use Engine.async_generate when integrating with async frameworks or when you need concurrent request processing within a single process.
Code Reference
Source Location
- Repository: sglang
- File: python/sglang/srt/entrypoints/engine.py
- Lines: L205-293 (generate), L295-373 (async_generate)
Signature
def generate(
self,
prompt: Optional[Union[List[str], str]] = None,
sampling_params: Optional[Union[List[Dict], Dict]] = None,
input_ids: Optional[Union[List[List[int]], List[int]]] = None,
image_data: Optional[MultimodalDataInputFormat] = None,
audio_data: Optional[MultimodalDataInputFormat] = None,
video_data: Optional[MultimodalDataInputFormat] = None,
return_logprob: Optional[Union[List[bool], bool]] = False,
logprob_start_len: Optional[Union[List[int], int]] = None,
top_logprobs_num: Optional[Union[List[int], int]] = None,
lora_path: Optional[List[Optional[str]]] = None,
stream: bool = False,
rid: Optional[Union[List[str], str]] = None,
) -> Union[Dict, Iterator[Dict]]:
"""Execute text generation. Returns dict or streaming iterator."""
Import
import sglang as sgl
engine = sgl.Engine(model_path="...")
result = engine.generate(prompt, sampling_params)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| prompt | Optional[Union[List[str], str]] | Yes (or input_ids) | Text prompt(s) for generation |
| sampling_params | Optional[Union[List[Dict], Dict]] | No | Sampling configuration dictionary |
| input_ids | Optional[Union[List[List[int]], List[int]]] | No | Token IDs (alternative to text prompt) |
| image_data | Optional[MultimodalDataInputFormat] | No | Image inputs for VLMs |
| stream | bool | No | Enable streaming output (default: False) |
| return_logprob | Optional[Union[List[bool], bool]] | No | Return log probabilities (default: False) |
Outputs
| Name | Type | Description |
|---|---|---|
| result | Dict | Keys: "text" (generated text), "meta_info" (metadata), "input_token_num", "output_token_num" |
| stream result | Iterator[Dict] | When stream=True, yields partial result dicts |
Usage Examples
Single Prompt
import sglang as sgl
engine = sgl.Engine(model_path="meta-llama/Llama-3.1-8B-Instruct")
output = engine.generate(
prompt="The capital of France is",
sampling_params={"temperature": 0, "max_new_tokens": 32},
)
print(output["text"])
Batch Prompts
prompts = [
"What is machine learning?",
"Explain the theory of relativity.",
"Write a haiku about programming.",
]
sampling_params = {"temperature": 0.7, "max_new_tokens": 128}
outputs = engine.generate(prompts, sampling_params)
for i, out in enumerate(outputs):
print(f"Prompt {i}: {out['text']}")
Streaming
for chunk in engine.generate(
"Tell me a story",
{"max_new_tokens": 256, "temperature": 0.8},
stream=True,
):
print(chunk["text"], end="", flush=True)