Implementation:Predibase Lorax Generate Response Handler
| Knowledge Sources | |
|---|---|
| Domains | API_Design, Streaming |
| Last Updated | 2026-02-08 02:00 GMT |
Overview
Concrete tool for delivering inference responses via JSON and SSE streaming provided by the LoRAX router and client types.
Description
The response handling system spans the Rust router and Python client. On the server side, generate() returns a JSON GenerateResponse while generate_stream() returns an SSE stream. On the client side, Response and StreamResponse Pydantic models parse the results with typed fields for generated text, token details, and finish reasons.
Usage
Used automatically when calling Client.generate() (non-streaming) or Client.generate_stream() (streaming). The response type depends on whether streaming was requested.
Code Reference
Source Location
- Repository: LoRAX
- File: router/src/server.rs (Lines: 644-1200)
- File: clients/python/lorax/types.py (Lines: 289-380)
Signature
class Token(BaseModel):
id: int
text: str
logprob: Optional[float]
special: bool
alternative_tokens: Optional[List[AlternativeToken]] = None
skipped: bool
class FinishReason(str, Enum):
Length = "length"
EndOfSequenceToken = "eos_token"
StopSequence = "stop_sequence"
class Details(BaseModel):
finish_reason: FinishReason
prompt_tokens: int
generated_tokens: int
skipped_tokens: int
seed: Optional[int] = None
prefill: List[InputToken]
tokens: List[Token]
class Response(BaseModel):
generated_text: str
details: Optional[Details] = None
class StreamResponse(BaseModel):
token: Token
generated_text: Optional[str] = None
details: Optional[StreamDetails] = None
Import
from lorax.types import Response, StreamResponse, Token, Details, FinishReason
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| Server response | JSON or SSE | Yes | Raw HTTP response from LoRAX server |
Outputs
| Name | Type | Description |
|---|---|---|
| Response | Response | Non-streaming: generated text + details |
| StreamResponse | Iterator[StreamResponse] | Streaming: token-by-token with final text |
Usage Examples
Non-Streaming
from lorax import Client
client = Client("http://localhost:3000")
response = client.generate(
"Explain quantum computing:",
adapter_id="my-adapter",
max_new_tokens=100,
details=True,
)
print(response.generated_text)
print(f"Finish: {response.details.finish_reason}")
print(f"Tokens: {response.details.generated_tokens}")
Streaming
text = ""
for stream_response in client.generate_stream(
"Write a story:",
adapter_id="my-adapter",
max_new_tokens=200,
):
if not stream_response.token.special:
text += stream_response.token.text
print(stream_response.token.text, end="", flush=True)