Implementation:Sgl project Sglang Triton Character Generation
| Knowledge Sources | |
|---|---|
| Domains | Model Serving, Structured Generation |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Triton Inference Server Python backend model that uses SGLang for constrained JSON character generation based on a Pydantic schema.
Description
model.py implements a TritonPythonModel class that wraps SGLang's structured generation capabilities for use with NVIDIA's Triton Inference Server. It defines a Character Pydantic model with name, eye_color, and house fields, then uses build_regex_from_object(Character) to auto-generate a regex constraint that ensures the LLM output conforms to valid JSON matching the schema.
The character_gen function, decorated with @function, constructs a Harry Potter character prompt and generates JSON output constrained by the regex. The TritonPythonModel class implements Triton's standard initialize and execute methods. On each request, it extracts input text names from Triton tensors via pb_utils.get_input_tensor_by_name, runs character_gen.run_batch() for efficient batch inference, and returns results as Triton output tensors.
The model connects to a local SGLang runtime at http://localhost:30000.
Usage
Use this example as a template for deploying SGLang-based structured generation behind Triton Inference Server, enabling standardized inference API access with Pydantic-based schema constraints and batch processing.
Code Reference
Source Location
- Repository: Sgl_project_Sglang
- File: examples/frontend_language/usage/triton/models/character_generation/1/model.py
- Lines: 1-55
Signature
class Character(BaseModel):
name: str
eye_color: str
house: str
@function
def character_gen(s, name): ...
class TritonPythonModel:
def initialize(self, args): ...
def execute(self, requests): ...
Import
import numpy
import triton_python_backend_utils as pb_utils
from pydantic import BaseModel
import sglang as sgl
from sglang import function
from sglang.srt.constrained.outlines_backend import build_regex_from_object
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| INPUT_TEXT | numpy array of strings | Yes | Array of character names to generate information for (Triton tensor) |
Outputs
| Name | Type | Description |
|---|---|---|
| OUTPUT_TEXT | numpy array of strings | Array of generated JSON character descriptions conforming to the Character schema |
Usage Examples
# The model is deployed as a Triton backend.
# Triton handles request routing; the execute method processes batches.
# Internal flow:
import sglang as sgl
from sglang import function
from sglang.srt.constrained.outlines_backend import build_regex_from_object
from pydantic import BaseModel
class Character(BaseModel):
name: str
eye_color: str
house: str
sgl.set_default_backend(sgl.RuntimeEndpoint("http://localhost:30000"))
@function
def character_gen(s, name):
s += (
name
+ " is a character in Harry Potter. Please fill in the following "
+ "information about this character.\n"
)
s += sgl.gen("json_output", max_tokens=256, regex=build_regex_from_object(Character))
states = character_gen.run_batch([{"name": "Hermione Granger"}, {"name": "Ron Weasley"}])
for state in states:
print(state.text())