Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Sgl project Sglang Chat Completions Streaming

From Leeroopedia


Knowledge Sources
Domains LLM_Serving, Streaming, API_Design
Last Updated 2026-02-10 00:00 GMT

Overview

Concrete tool for handling streaming chat completion responses from the SGLang server via Server-Sent Events.

Description

When stream=True is passed to the /v1/chat/completions endpoint, the server returns an SSE stream of ChatCompletionChunk objects. Each chunk contains delta.content with the next portion of generated text. The OpenAI Python SDK handles SSE parsing and exposes chunks as an iterator. SGLang also supports stream_options for including usage statistics in the final chunk.

Usage

Set stream=True in chat completion requests when building interactive applications. Iterate over the response to process tokens as they arrive.

Code Reference

Source Location

  • Repository: sglang
  • File: python/sglang/srt/entrypoints/http_server.py
  • Lines: L1324-1331 (same endpoint as non-streaming, with stream=True)
  • Streaming logic: python/sglang/srt/entrypoints/openai/serving_base.py

Signature

# Client-side (OpenAI SDK)
stream = client.chat.completions.create(
    model: str,
    messages: List[Dict],
    stream: bool = True,
    stream_options: Optional[Dict] = None,  # e.g., {"include_usage": True}
    **sampling_params,
) -> Stream[ChatCompletionChunk]

# Each chunk has:
# chunk.choices[0].delta.content -> Optional[str]

Import

import openai
client = openai.Client(base_url="http://localhost:30000/v1", api_key="EMPTY")

I/O Contract

Inputs

Name Type Required Description
model str Yes Model name
messages List[Dict] Yes Conversation messages
stream bool Yes Must be True for streaming
stream_options Optional[Dict] No Options like {"include_usage": True}

Outputs

Name Type Description
Stream[ChatCompletionChunk] Iterator Yields chunks with delta.content containing generated text fragments

Usage Examples

Basic Streaming

stream = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user", "content": "Write a short story."}],
    stream=True,
    max_tokens=256,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
print()  # Newline at end

With Usage Statistics

stream = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user", "content": "Explain AI."}],
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
    if chunk.usage:
        print(f"\nTokens: {chunk.usage.prompt_tokens} + {chunk.usage.completion_tokens}")

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment