Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Sgl project Sglang Engine Generate

From Leeroopedia


Knowledge Sources
Domains LLM_Serving, Text_Generation, Inference
Last Updated 2026-02-10 00:00 GMT

Overview

Concrete tool for executing synchronous and asynchronous text generation requests provided by the SGLang Engine.

Description

The Engine.generate method accepts single or batched prompts (as text or token IDs) along with sampling parameters, and returns generated text with metadata. It wraps the internal TokenizerManager.generate_request coroutine in a synchronous interface. For async workflows, Engine.async_generate provides the native async version. Both support streaming via the stream=True parameter.

Usage

Call Engine.generate for synchronous batch inference. Use Engine.async_generate when integrating with async frameworks or when you need concurrent request processing within a single process.

Code Reference

Source Location

  • Repository: sglang
  • File: python/sglang/srt/entrypoints/engine.py
  • Lines: L205-293 (generate), L295-373 (async_generate)

Signature

def generate(
    self,
    prompt: Optional[Union[List[str], str]] = None,
    sampling_params: Optional[Union[List[Dict], Dict]] = None,
    input_ids: Optional[Union[List[List[int]], List[int]]] = None,
    image_data: Optional[MultimodalDataInputFormat] = None,
    audio_data: Optional[MultimodalDataInputFormat] = None,
    video_data: Optional[MultimodalDataInputFormat] = None,
    return_logprob: Optional[Union[List[bool], bool]] = False,
    logprob_start_len: Optional[Union[List[int], int]] = None,
    top_logprobs_num: Optional[Union[List[int], int]] = None,
    lora_path: Optional[List[Optional[str]]] = None,
    stream: bool = False,
    rid: Optional[Union[List[str], str]] = None,
) -> Union[Dict, Iterator[Dict]]:
    """Execute text generation. Returns dict or streaming iterator."""

Import

import sglang as sgl

engine = sgl.Engine(model_path="...")
result = engine.generate(prompt, sampling_params)

I/O Contract

Inputs

Name Type Required Description
prompt Optional[Union[List[str], str]] Yes (or input_ids) Text prompt(s) for generation
sampling_params Optional[Union[List[Dict], Dict]] No Sampling configuration dictionary
input_ids Optional[Union[List[List[int]], List[int]]] No Token IDs (alternative to text prompt)
image_data Optional[MultimodalDataInputFormat] No Image inputs for VLMs
stream bool No Enable streaming output (default: False)
return_logprob Optional[Union[List[bool], bool]] No Return log probabilities (default: False)

Outputs

Name Type Description
result Dict Keys: "text" (generated text), "meta_info" (metadata), "input_token_num", "output_token_num"
stream result Iterator[Dict] When stream=True, yields partial result dicts

Usage Examples

Single Prompt

import sglang as sgl

engine = sgl.Engine(model_path="meta-llama/Llama-3.1-8B-Instruct")

output = engine.generate(
    prompt="The capital of France is",
    sampling_params={"temperature": 0, "max_new_tokens": 32},
)
print(output["text"])

Batch Prompts

prompts = [
    "What is machine learning?",
    "Explain the theory of relativity.",
    "Write a haiku about programming.",
]
sampling_params = {"temperature": 0.7, "max_new_tokens": 128}

outputs = engine.generate(prompts, sampling_params)
for i, out in enumerate(outputs):
    print(f"Prompt {i}: {out['text']}")

Streaming

for chunk in engine.generate(
    "Tell me a story",
    {"max_new_tokens": 256, "temperature": 0.8},
    stream=True,
):
    print(chunk["text"], end="", flush=True)

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment