Implementation:Predibase Lorax Generate Response Handler

Knowledge Sources	LoRAX
Domains	API_Design, Streaming
Last Updated	2026-02-08 02:00 GMT

Overview

Concrete tool for delivering inference responses via JSON and SSE streaming provided by the LoRAX router and client types.

Description

The response handling system spans the Rust router and Python client. On the server side, generate() returns a JSON GenerateResponse while generate_stream() returns an SSE stream. On the client side, Response and StreamResponse Pydantic models parse the results with typed fields for generated text, token details, and finish reasons.

Usage

Used automatically when calling Client.generate() (non-streaming) or Client.generate_stream() (streaming). The response type depends on whether streaming was requested.

Code Reference

Source Location

Repository: LoRAX
File: router/src/server.rs (Lines: 644-1200)
File: clients/python/lorax/types.py (Lines: 289-380)

Signature

class Token(BaseModel):
    id: int
    text: str
    logprob: Optional[float]
    special: bool
    alternative_tokens: Optional[List[AlternativeToken]] = None
    skipped: bool

class FinishReason(str, Enum):
    Length = "length"
    EndOfSequenceToken = "eos_token"
    StopSequence = "stop_sequence"

class Details(BaseModel):
    finish_reason: FinishReason
    prompt_tokens: int
    generated_tokens: int
    skipped_tokens: int
    seed: Optional[int] = None
    prefill: List[InputToken]
    tokens: List[Token]

class Response(BaseModel):
    generated_text: str
    details: Optional[Details] = None

class StreamResponse(BaseModel):
    token: Token
    generated_text: Optional[str] = None
    details: Optional[StreamDetails] = None

Import

from lorax.types import Response, StreamResponse, Token, Details, FinishReason

I/O Contract

Inputs

Name	Type	Required	Description
Server response	JSON or SSE	Yes	Raw HTTP response from LoRAX server

Outputs

Name	Type	Description
Response	Response	Non-streaming: generated text + details
StreamResponse	Iterator[StreamResponse]	Streaming: token-by-token with final text

Usage Examples

Non-Streaming

from lorax import Client

client = Client("http://localhost:3000")
response = client.generate(
    "Explain quantum computing:",
    adapter_id="my-adapter",
    max_new_tokens=100,
    details=True,
)
print(response.generated_text)
print(f"Finish: {response.details.finish_reason}")
print(f"Tokens: {response.details.generated_tokens}")

Streaming

text = ""
for stream_response in client.generate_stream(
    "Write a story:",
    adapter_id="my-adapter",
    max_new_tokens=200,
):
    if not stream_response.token.special:
        text += stream_response.token.text
        print(stream_response.token.text, end="", flush=True)

Related Pages

Implements Principle

Principle:Predibase_Lorax_Streaming_Response_Handling

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment