Implementation:Sgl project Sglang OpenAI Client Setup
Appearance
| Knowledge Sources | |
|---|---|
| Domains | LLM_Serving, API_Client, Integration |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Wrapper documentation for configuring the OpenAI Python SDK to communicate with an SGLang server.
Description
The openai.Client class (from the openai PyPI package) is configured with base_url pointing to the SGLang server and api_key set to any string. This enables all OpenAI SDK methods (chat.completions.create, completions.create, embeddings.create) to work with the SGLang backend.
Usage
Set up the OpenAI client after launching an SGLang HTTP server. Use this whenever you need programmatic access to the server from Python code, scripts, or frameworks that support the OpenAI API.
Code Reference
Source Location
- Library: openai (external)
- Usage pattern: examples/runtime/openai_chat_with_response_prefill.py
Signature
import openai
client = openai.Client(
base_url: str, # SGLang server URL with /v1 suffix
api_key: str, # Any string (SGLang ignores by default)
)
Import
import openai
# or
from openai import OpenAI
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| base_url | str | Yes | SGLang server URL with "/v1" suffix (e.g., "http://localhost:30000/v1") |
| api_key | str | Yes | Any string (SGLang ignores unless --api-key is set on server) |
Outputs
| Name | Type | Description |
|---|---|---|
| client | openai.Client | Configured client instance for making API calls |
Usage Examples
Basic Setup
import openai
# Point to SGLang server
client = openai.Client(
base_url="http://localhost:30000/v1",
api_key="EMPTY",
)
# Use exactly like the OpenAI API
response = client.chat.completions.create(
model="meta-llama/Llama-3.1-8B-Instruct",
messages=[{"role": "user", "content": "What is SGLang?"}],
temperature=0.7,
max_tokens=128,
)
print(response.choices[0].message.content)
Async Client
import openai
async_client = openai.AsyncClient(
base_url="http://localhost:30000/v1",
api_key="EMPTY",
)
# Async usage
response = await async_client.chat.completions.create(
model="meta-llama/Llama-3.1-8B-Instruct",
messages=[{"role": "user", "content": "Hello!"}],
)
External Reference
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment