Implementation:Sgl project Sglang Triton Character Generation

Knowledge Sources	Sgl_project_Sglang
Domains	Model Serving, Structured Generation
Last Updated	2026-02-10 00:00 GMT

Overview

Triton Inference Server Python backend model that uses SGLang for constrained JSON character generation based on a Pydantic schema.

Description

model.py implements a TritonPythonModel class that wraps SGLang's structured generation capabilities for use with NVIDIA's Triton Inference Server. It defines a Character Pydantic model with name, eye_color, and house fields, then uses build_regex_from_object(Character) to auto-generate a regex constraint that ensures the LLM output conforms to valid JSON matching the schema.

The character_gen function, decorated with @function, constructs a Harry Potter character prompt and generates JSON output constrained by the regex. The TritonPythonModel class implements Triton's standard initialize and execute methods. On each request, it extracts input text names from Triton tensors via pb_utils.get_input_tensor_by_name, runs character_gen.run_batch() for efficient batch inference, and returns results as Triton output tensors.

The model connects to a local SGLang runtime at http://localhost:30000.

Usage

Use this example as a template for deploying SGLang-based structured generation behind Triton Inference Server, enabling standardized inference API access with Pydantic-based schema constraints and batch processing.

Code Reference

Source Location

Repository: Sgl_project_Sglang
File: examples/frontend_language/usage/triton/models/character_generation/1/model.py
Lines: 1-55

Signature

class Character(BaseModel):
    name: str
    eye_color: str
    house: str

@function
def character_gen(s, name): ...

class TritonPythonModel:
    def initialize(self, args): ...
    def execute(self, requests): ...

Import

import numpy
import triton_python_backend_utils as pb_utils
from pydantic import BaseModel

import sglang as sgl
from sglang import function
from sglang.srt.constrained.outlines_backend import build_regex_from_object

I/O Contract

Inputs

Name	Type	Required	Description
INPUT_TEXT	numpy array of strings	Yes	Array of character names to generate information for (Triton tensor)

Outputs

Name	Type	Description
OUTPUT_TEXT	numpy array of strings	Array of generated JSON character descriptions conforming to the Character schema

Usage Examples

# The model is deployed as a Triton backend.
# Triton handles request routing; the execute method processes batches.

# Internal flow:
import sglang as sgl
from sglang import function
from sglang.srt.constrained.outlines_backend import build_regex_from_object
from pydantic import BaseModel

class Character(BaseModel):
    name: str
    eye_color: str
    house: str

sgl.set_default_backend(sgl.RuntimeEndpoint("http://localhost:30000"))

@function
def character_gen(s, name):
    s += (
        name
        + " is a character in Harry Potter. Please fill in the following "
        + "information about this character.\n"
    )
    s += sgl.gen("json_output", max_tokens=256, regex=build_regex_from_object(Character))

states = character_gen.run_batch([{"name": "Hermione Granger"}, {"name": "Ron Weasley"}])
for state in states:
    print(state.text())

Related Pages

Environment:Sgl_project_Sglang_Triton

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment