Implementation:Predibase Lorax Chat Completions Handler

Knowledge Sources	LoRAX
Domains	Text_Generation, API_Compatibility
Last Updated	2026-02-08 02:00 GMT

Overview

Concrete tool for handling OpenAI-format chat completion requests provided by the chat_completions_v1 handler in the LoRAX Rust router.

Description

The chat_completions_v1 handler in router/src/server.rs processes /v1/chat/completions POST requests. It deserializes the ChatCompletionRequest, converts it to internal parameters, renders the chat template, and delegates to either Infer::generate() (non-streaming) or Infer::generate_stream() (streaming). Responses are formatted as ChatCompletionResponse or SSE stream of ChatCompletionStreamResponse events.

Usage

Invoked automatically when POST requests hit /v1/chat/completions. Not called directly.

Code Reference

Source Location

Repository: LoRAX
File: router/src/server.rs
Lines: 253-411

Signature

async fn chat_completions_v1(
    infer: Extension<Infer>,
    info: Extension<Info>,
    req_headers: HeaderMap,
    req: Json<ChatCompletionRequest>,
) -> Result<Response, (StatusCode, Json<ErrorResponse>)>

// Response types (router/src/lib.rs)
pub struct ChatCompletionResponse {
    pub id: String,
    pub object: String,           // "chat.completion"
    pub created: u64,
    pub model: String,
    pub system_fingerprint: String,
    pub choices: Vec<ChatCompletionChoice>,
    pub usage: Usage,
}

pub struct ChatCompletionStreamResponse {
    pub id: String,
    pub object: String,           // "chat.completion.chunk"
    pub created: u64,
    pub model: String,
    pub system_fingerprint: String,
    pub choices: Vec<ChatCompletionStreamChoice>,
}

Import

// Internal handler, registered as route
.route("/v1/chat/completions", post(chat_completions_v1))

I/O Contract

Inputs

Name	Type	Required	Description
ChatCompletionRequest	JSON body	Yes	OpenAI-format request with model, messages, params

Outputs

Name	Type	Description
ChatCompletionResponse	JSON	Non-streaming: full response with choices and usage
SSE Stream	Event stream	Streaming: ChatCompletionStreamResponse chunks + [DONE]

Usage Examples

Non-Streaming

from openai import OpenAI

client = OpenAI(base_url="http://localhost:3000/v1", api_key="x")

response = client.chat.completions.create(
    model="my-org/my-adapter",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain transformers in one sentence."},
    ],
    max_tokens=100,
    temperature=0.3,
)
print(response.choices[0].message.content)
print(f"Usage: {response.usage.prompt_tokens} prompt, {response.usage.completion_tokens} completion")

Streaming

stream = client.chat.completions.create(
    model="my-org/my-adapter",
    messages=[{"role": "user", "content": "Write a haiku about ML"}],
    stream=True,
    max_tokens=50,
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Related Pages

Implements Principle

Principle:Predibase_Lorax_Chat_Completion_Generation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment