Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Sgl project Sglang Triton Character Generation

From Leeroopedia


Knowledge Sources
Domains Model Serving, Structured Generation
Last Updated 2026-02-10 00:00 GMT

Overview

Triton Inference Server Python backend model that uses SGLang for constrained JSON character generation based on a Pydantic schema.

Description

model.py implements a TritonPythonModel class that wraps SGLang's structured generation capabilities for use with NVIDIA's Triton Inference Server. It defines a Character Pydantic model with name, eye_color, and house fields, then uses build_regex_from_object(Character) to auto-generate a regex constraint that ensures the LLM output conforms to valid JSON matching the schema.

The character_gen function, decorated with @function, constructs a Harry Potter character prompt and generates JSON output constrained by the regex. The TritonPythonModel class implements Triton's standard initialize and execute methods. On each request, it extracts input text names from Triton tensors via pb_utils.get_input_tensor_by_name, runs character_gen.run_batch() for efficient batch inference, and returns results as Triton output tensors.

The model connects to a local SGLang runtime at http://localhost:30000.

Usage

Use this example as a template for deploying SGLang-based structured generation behind Triton Inference Server, enabling standardized inference API access with Pydantic-based schema constraints and batch processing.

Code Reference

Source Location

Signature

class Character(BaseModel):
    name: str
    eye_color: str
    house: str

@function
def character_gen(s, name): ...

class TritonPythonModel:
    def initialize(self, args): ...
    def execute(self, requests): ...

Import

import numpy
import triton_python_backend_utils as pb_utils
from pydantic import BaseModel

import sglang as sgl
from sglang import function
from sglang.srt.constrained.outlines_backend import build_regex_from_object

I/O Contract

Inputs

Name Type Required Description
INPUT_TEXT numpy array of strings Yes Array of character names to generate information for (Triton tensor)

Outputs

Name Type Description
OUTPUT_TEXT numpy array of strings Array of generated JSON character descriptions conforming to the Character schema

Usage Examples

# The model is deployed as a Triton backend.
# Triton handles request routing; the execute method processes batches.

# Internal flow:
import sglang as sgl
from sglang import function
from sglang.srt.constrained.outlines_backend import build_regex_from_object
from pydantic import BaseModel

class Character(BaseModel):
    name: str
    eye_color: str
    house: str

sgl.set_default_backend(sgl.RuntimeEndpoint("http://localhost:30000"))

@function
def character_gen(s, name):
    s += (
        name
        + " is a character in Harry Potter. Please fill in the following "
        + "information about this character.\n"
    )
    s += sgl.gen("json_output", max_tokens=256, regex=build_regex_from_object(Character))

states = character_gen.run_batch([{"name": "Hermione Granger"}, {"name": "Ron Weasley"}])
for state in states:
    print(state.text())

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment