Heuristic:Haifengl Smile Quarkus Async Context Handling
| Knowledge Sources | |
|---|---|
| Domains | Model_Serving, REST_API, Reactive_Programming |
| Last Updated | 2026-02-08 22:00 GMT |
Overview
Context propagation rules for Quarkus async/reactive streaming endpoints to prevent undefined routing context in worker threads.
Description
When implementing SSE (Server-Sent Events) streaming in Quarkus REST endpoints, the Jakarta routing context and HTTP headers must be captured in the endpoint method before dispatching work to async threads. Worker threads spawned via `ManagedExecutor.supplyAsync()` do not have access to the request's `RoutingContext`. Additionally, the `@RestStreamElementType(MediaType.TEXT_PLAIN)` annotation is critical to ensure items are streamed individually without buffering, and a leading space should be prepended to streamed chunks because some SSE clients consume the space after the `data:` prefix.
Usage
Use this heuristic when implementing streaming REST endpoints, adding new async prediction routes, or debugging undefined context errors in Quarkus worker threads. Applies to both the `InferenceResource` (ML model streaming) and `ChatCompletionResource` (LLM chat streaming) endpoints.
The Insight (Rule of Thumb)
- Action 1: Always capture `RoutingContext` and `HttpHeaders` in the endpoint method body, not inside the async task.
- Action 2: Always annotate streaming endpoints with `@RestStreamElementType(MediaType.TEXT_PLAIN)` to prevent buffering.
- Action 3: Prepend a space to each streamed chunk (`.map(chunk -> " " + chunk)`) to handle clients that strip the space after SSE `data:`.
- Value: N/A (pattern, not a numeric setting).
- Trade-off: Slightly more boilerplate in endpoint methods, but prevents subtle threading bugs that are hard to diagnose.
Reasoning
Quarkus uses a reactive threading model where the request context (including `RoutingContext`, CDI contexts, and security credentials) is bound to the request thread. When work is dispatched to a `ManagedExecutor` thread pool, these contexts are not automatically propagated. This is a documented Quarkus behavior but is easy to forget when adding new streaming endpoints.
The SSE space-prepending workaround addresses a real-world client compatibility issue where some SSE parsers strip the first character after `data:` if it is a space, which can corrupt the streamed data.
Code evidence from `serve/src/main/java/smile/chat/ChatCompletionResource.java:58-74`:
@RestStreamElementType(MediaType.TEXT_PLAIN) // Important for streaming item by item without buffering
public Multi<String> complete(@Context HttpHeaders headers,
CompletionRequest request) {
// Must set context in the endpoint instead of supplyAsync.
// Otherwise, routingContext is undefined in the worker thread.
conversation.setContext(routingContext, headers);
SubmissionPublisher<String> publisher = new SubmissionPublisher<>();
executor.supplyAsync(() -> {
var completions = service.complete(request, publisher);
saveConversation(conversation, request, completions);
return completions;
});
return Multi.createFrom()
.publisher(publisher)
.map(chunk -> " " + chunk); // in case client eats the space after 'data:'
}
Same streaming annotation pattern from `serve/src/main/java/smile/serve/InferenceResource.java:75`:
@RestStreamElementType(MediaType.TEXT_PLAIN) // Important for streaming item by item without buffering