Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Predibase Lorax Generate Response Handler

From Leeroopedia


Knowledge Sources
Domains API_Design, Streaming
Last Updated 2026-02-08 02:00 GMT

Overview

Concrete tool for delivering inference responses via JSON and SSE streaming provided by the LoRAX router and client types.

Description

The response handling system spans the Rust router and Python client. On the server side, generate() returns a JSON GenerateResponse while generate_stream() returns an SSE stream. On the client side, Response and StreamResponse Pydantic models parse the results with typed fields for generated text, token details, and finish reasons.

Usage

Used automatically when calling Client.generate() (non-streaming) or Client.generate_stream() (streaming). The response type depends on whether streaming was requested.

Code Reference

Source Location

  • Repository: LoRAX
  • File: router/src/server.rs (Lines: 644-1200)
  • File: clients/python/lorax/types.py (Lines: 289-380)

Signature

class Token(BaseModel):
    id: int
    text: str
    logprob: Optional[float]
    special: bool
    alternative_tokens: Optional[List[AlternativeToken]] = None
    skipped: bool

class FinishReason(str, Enum):
    Length = "length"
    EndOfSequenceToken = "eos_token"
    StopSequence = "stop_sequence"

class Details(BaseModel):
    finish_reason: FinishReason
    prompt_tokens: int
    generated_tokens: int
    skipped_tokens: int
    seed: Optional[int] = None
    prefill: List[InputToken]
    tokens: List[Token]

class Response(BaseModel):
    generated_text: str
    details: Optional[Details] = None

class StreamResponse(BaseModel):
    token: Token
    generated_text: Optional[str] = None
    details: Optional[StreamDetails] = None

Import

from lorax.types import Response, StreamResponse, Token, Details, FinishReason

I/O Contract

Inputs

Name Type Required Description
Server response JSON or SSE Yes Raw HTTP response from LoRAX server

Outputs

Name Type Description
Response Response Non-streaming: generated text + details
StreamResponse Iterator[StreamResponse] Streaming: token-by-token with final text

Usage Examples

Non-Streaming

from lorax import Client

client = Client("http://localhost:3000")
response = client.generate(
    "Explain quantum computing:",
    adapter_id="my-adapter",
    max_new_tokens=100,
    details=True,
)
print(response.generated_text)
print(f"Finish: {response.details.finish_reason}")
print(f"Tokens: {response.details.generated_tokens}")

Streaming

text = ""
for stream_response in client.generate_stream(
    "Write a story:",
    adapter_id="my-adapter",
    max_new_tokens=200,
):
    if not stream_response.token.special:
        text += stream_response.token.text
        print(stream_response.token.text, end="", flush=True)

Related Pages

Implements Principle

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment