Implementation:Sgl project Sglang Sgl Function Run
| Knowledge Sources | |
|---|---|
| Domains | Frontend_DSL, Execution, LLM_Programming |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Concrete tool for executing SGLang generation programs using the .run() and .run_batch() methods on SglFunction objects.
Description
SglFunction.run() executes a single program instance with provided arguments and default sampling parameters. It returns a ProgramState containing generated variables. SglFunction.run_batch() processes multiple sets of arguments in parallel using thread pooling. Both methods accept sampling parameters (temperature, max_new_tokens, etc.) as keyword arguments.
Usage
Call .run(kwargs) for single execution or .run_batch(list_of_kwargs) for batch processing. Pass sampling parameters as keyword arguments to override defaults.
Code Reference
Source Location
- Repository: sglang
- File: python/sglang/lang/ir.py
- Lines: L160-221 (run), L223-293 (run_batch)
Signature
class SglFunction:
def run(
self,
*args,
max_new_tokens: int = 128,
temperature: float = 1.0,
top_p: float = 1.0,
top_k: int = -1,
stream: bool = False,
backend: Optional[BaseBackend] = None,
**kwargs,
) -> ProgramState:
"""Execute a single program instance."""
def run_batch(
self,
batch_kwargs: List[Dict],
*,
max_new_tokens: int = 128,
temperature: float = 1.0,
num_threads: Union[str, int] = "auto",
progress_bar: bool = False,
**kwargs,
) -> List[ProgramState]:
"""Execute multiple program instances in parallel."""
Import
import sglang as sgl
@sgl.function
def my_func(s, text):
s += sgl.gen("output")
# Single execution
state = my_func.run(text="hello", temperature=0.7)
# Batch execution
states = my_func.run_batch(
[{"text": "hello"}, {"text": "world"}],
temperature=0.7,
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| *args / **kwargs | Any | Yes | Arguments matching the decorated function's signature |
| max_new_tokens | int | No | Max tokens per generation (default: 128) |
| temperature | float | No | Sampling temperature (default: 1.0) |
| stream | bool | No | Enable streaming (default: False) |
| batch_kwargs | List[Dict] | Yes (run_batch) | List of argument dicts for batch execution |
| num_threads | Union[str, int] | No | Thread count for batch ("auto" or integer) |
Outputs
| Name | Type | Description |
|---|---|---|
| ProgramState | ProgramState | Contains all generated variables (via .run()) |
| List[ProgramState] | List[ProgramState] | List of states (via .run_batch()) |
Usage Examples
Single Run
@sgl.function
def summarize(s, text):
s += sgl.user(f"Summarize: {text}")
s += sgl.assistant(sgl.gen("summary", max_tokens=100))
state = summarize.run(
text="Long article text...",
temperature=0.3,
max_new_tokens=200,
)
print(state["summary"])
Batch Run
texts = ["Article 1...", "Article 2...", "Article 3..."]
states = summarize.run_batch(
[{"text": t} for t in texts],
temperature=0.3,
max_new_tokens=200,
num_threads="auto",
progress_bar=True,
)
for i, state in enumerate(states):
print(f"Summary {i}: {state['summary'][:80]}...")
Streaming Run
state = summarize.run(
text="Long article...",
stream=True,
)
for chunk in state.text_iter():
print(chunk, end="", flush=True)