Implementation:Sgl project Sglang Sgl Function Run

Knowledge Sources	SGLang
Domains	Frontend_DSL, Execution, LLM_Programming
Last Updated	2026-02-10 00:00 GMT

Overview

Concrete tool for executing SGLang generation programs using the .run() and .run_batch() methods on SglFunction objects.

Description

SglFunction.run() executes a single program instance with provided arguments and default sampling parameters. It returns a ProgramState containing generated variables. SglFunction.run_batch() processes multiple sets of arguments in parallel using thread pooling. Both methods accept sampling parameters (temperature, max_new_tokens, etc.) as keyword arguments.

Usage

Call .run(kwargs) for single execution or .run_batch(list_of_kwargs) for batch processing. Pass sampling parameters as keyword arguments to override defaults.

Code Reference

Source Location

Repository: sglang
File: python/sglang/lang/ir.py
Lines: L160-221 (run), L223-293 (run_batch)

Signature

class SglFunction:
    def run(
        self,
        *args,
        max_new_tokens: int = 128,
        temperature: float = 1.0,
        top_p: float = 1.0,
        top_k: int = -1,
        stream: bool = False,
        backend: Optional[BaseBackend] = None,
        **kwargs,
    ) -> ProgramState:
        """Execute a single program instance."""

    def run_batch(
        self,
        batch_kwargs: List[Dict],
        *,
        max_new_tokens: int = 128,
        temperature: float = 1.0,
        num_threads: Union[str, int] = "auto",
        progress_bar: bool = False,
        **kwargs,
    ) -> List[ProgramState]:
        """Execute multiple program instances in parallel."""

Import

import sglang as sgl

@sgl.function
def my_func(s, text):
    s += sgl.gen("output")

# Single execution
state = my_func.run(text="hello", temperature=0.7)

# Batch execution
states = my_func.run_batch(
    [{"text": "hello"}, {"text": "world"}],
    temperature=0.7,
)

I/O Contract

Inputs

Name	Type	Required	Description
args / *kwargs	Any	Yes	Arguments matching the decorated function's signature
max_new_tokens	int	No	Max tokens per generation (default: 128)
temperature	float	No	Sampling temperature (default: 1.0)
stream	bool	No	Enable streaming (default: False)
batch_kwargs	List[Dict]	Yes (run_batch)	List of argument dicts for batch execution
num_threads	Union[str, int]	No	Thread count for batch ("auto" or integer)

Outputs

Name	Type	Description
ProgramState	ProgramState	Contains all generated variables (via .run())
List[ProgramState]	List[ProgramState]	List of states (via .run_batch())

Usage Examples

Single Run

@sgl.function
def summarize(s, text):
    s += sgl.user(f"Summarize: {text}")
    s += sgl.assistant(sgl.gen("summary", max_tokens=100))

state = summarize.run(
    text="Long article text...",
    temperature=0.3,
    max_new_tokens=200,
)
print(state["summary"])

Batch Run

texts = ["Article 1...", "Article 2...", "Article 3..."]

states = summarize.run_batch(
    [{"text": t} for t in texts],
    temperature=0.3,
    max_new_tokens=200,
    num_threads="auto",
    progress_bar=True,
)

for i, state in enumerate(states):
    print(f"Summary {i}: {state['summary'][:80]}...")

Streaming Run

state = summarize.run(
    text="Long article...",
    stream=True,
)
for chunk in state.text_iter():
    print(chunk, end="", flush=True)

Related Pages

Implements Principle

Principle:Sgl_project_Sglang_Generation_Program_Execution

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment