Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Predibase Lorax Chat Completion Generation

From Leeroopedia


Knowledge Sources
Domains Text_Generation, API_Compatibility
Last Updated 2026-02-08 02:00 GMT

Overview

A text generation endpoint that processes chat completion requests through the full inference pipeline, returning OpenAI-format responses with support for both streaming and non-streaming modes.

Description

Chat Completion Generation is the core execution step of the OpenAI-compatible API. It:

  1. Receives a ChatCompletionRequest with messages and parameters
  2. Extracts the adapter ID from the model field
  3. Renders the chat template into a prompt string
  4. Sends the prompt through the inference engine (batching, LoRA application, decoding)
  5. Formats the output as an OpenAI-compatible ChatCompletionResponse or ChatCompletionStreamResponse

Streaming uses Server-Sent Events with delta objects, matching the OpenAI streaming format. The stream ends with a [DONE] sentinel.

Usage

Use when making chat completion API calls. The endpoint supports all standard OpenAI parameters (temperature, top_p, max_tokens, stop, seed) plus LoRAX-specific extensions (adapter_source, api_token).

Theoretical Basis

Pseudo-code:

# Chat completion pipeline
def chat_completions(request):
    params = request.try_into_generate()
    prompt = apply_chat_template(request.messages)
    if request.stream:
        for token in infer.generate_stream(prompt, params):
            yield ChatCompletionStreamResponse(delta=token)
        yield "[DONE]"
    else:
        result = infer.generate(prompt, params)
        return ChatCompletionResponse(choices=[result])

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment