Implementation:Predibase Lorax Chat Completions Handler
| Knowledge Sources | |
|---|---|
| Domains | Text_Generation, API_Compatibility |
| Last Updated | 2026-02-08 02:00 GMT |
Overview
Concrete tool for handling OpenAI-format chat completion requests provided by the chat_completions_v1 handler in the LoRAX Rust router.
Description
The chat_completions_v1 handler in router/src/server.rs processes /v1/chat/completions POST requests. It deserializes the ChatCompletionRequest, converts it to internal parameters, renders the chat template, and delegates to either Infer::generate() (non-streaming) or Infer::generate_stream() (streaming). Responses are formatted as ChatCompletionResponse or SSE stream of ChatCompletionStreamResponse events.
Usage
Invoked automatically when POST requests hit /v1/chat/completions. Not called directly.
Code Reference
Source Location
- Repository: LoRAX
- File: router/src/server.rs
- Lines: 253-411
Signature
async fn chat_completions_v1(
infer: Extension<Infer>,
info: Extension<Info>,
req_headers: HeaderMap,
req: Json<ChatCompletionRequest>,
) -> Result<Response, (StatusCode, Json<ErrorResponse>)>
// Response types (router/src/lib.rs)
pub struct ChatCompletionResponse {
pub id: String,
pub object: String, // "chat.completion"
pub created: u64,
pub model: String,
pub system_fingerprint: String,
pub choices: Vec<ChatCompletionChoice>,
pub usage: Usage,
}
pub struct ChatCompletionStreamResponse {
pub id: String,
pub object: String, // "chat.completion.chunk"
pub created: u64,
pub model: String,
pub system_fingerprint: String,
pub choices: Vec<ChatCompletionStreamChoice>,
}
Import
// Internal handler, registered as route
.route("/v1/chat/completions", post(chat_completions_v1))
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| ChatCompletionRequest | JSON body | Yes | OpenAI-format request with model, messages, params |
Outputs
| Name | Type | Description |
|---|---|---|
| ChatCompletionResponse | JSON | Non-streaming: full response with choices and usage |
| SSE Stream | Event stream | Streaming: ChatCompletionStreamResponse chunks + [DONE] |
Usage Examples
Non-Streaming
from openai import OpenAI
client = OpenAI(base_url="http://localhost:3000/v1", api_key="x")
response = client.chat.completions.create(
model="my-org/my-adapter",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain transformers in one sentence."},
],
max_tokens=100,
temperature=0.3,
)
print(response.choices[0].message.content)
print(f"Usage: {response.usage.prompt_tokens} prompt, {response.usage.completion_tokens} completion")
Streaming
stream = client.chat.completions.create(
model="my-org/my-adapter",
messages=[{"role": "user", "content": "Write a haiku about ML"}],
stream=True,
max_tokens=50,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)