Principle:Sgl project Sglang Chat Completion API
| Knowledge Sources | |
|---|---|
| Domains | LLM_Serving, API_Design, Chat |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
An HTTP API endpoint pattern that accepts multi-turn conversation messages and returns model completions following the OpenAI Chat Completions specification.
Description
The Chat Completion API is the primary interface for conversational LLM interaction in production systems. It accepts a messages array containing role-tagged turns (system, user, assistant) and returns a structured response with the model's reply. SGLang implements this as a FastAPI endpoint at /v1/chat/completions with full compatibility to the OpenAI specification, plus SGLang-specific extensions like regex for constrained decoding.
Usage
Use the Chat Completion API for any conversational interaction with the model — chatbots, question answering, instruction following, multi-turn dialogue. It is the standard endpoint for most production LLM applications.
Theoretical Basis
The API follows a request-response pattern with structured message history:
Request structure:
- model: Which model to use
- messages: Array of {role, content} objects representing the conversation
- temperature, max_tokens, etc.: Sampling parameters
Response structure:
- choices: Array of completion options (usually length 1)
- usage: Token count statistics
The multi-turn message format enables the model to maintain conversational context without explicit state management on the server side.