Principle:Mlc ai Web llm Web Worker Engine Proxy
Overview
Web Worker Engine Proxy is the architectural pattern for creating a transparent proxy on the main thread that exposes the same MLCEngineInterface as a direct MLCEngine, while delegating all actual computation to a Web Worker thread. This enables developers to write identical application code regardless of whether inference runs on the main thread or in a worker.
Description
The engine proxy pattern provides a seamless developer experience by implementing the MLCEngineInterface on the main thread. The proxy serializes method calls into WorkerRequest messages, sends them to the worker via postMessage, and deserializes WorkerResponse messages back into return values using promise-based RPC.
The key design decisions are:
- Interface parity: The proxy class (
WebWorkerMLCEngine) implements the sameMLCEngineInterfaceasMLCEngine. This meansengine.chat.completions.create(),engine.completions.create(), andengine.embeddings.create()work identically. - UUID-based correlation: Each outgoing request is tagged with a
crypto.randomUUID(). The proxy stores a resolver callback in apendingPromisemap. When a response arrives with a matching UUID, the corresponding promise is resolved or rejected. - Init progress forwarding: The worker sends
initProgressCallbackmessages during model loading. The proxy intercepts these and invokes the user-registeredInitProgressCallbackon the main thread. - State synchronization: The proxy maintains its own
modelIdandchatOptsarrays that mirror the worker-side state. These are sent with every inference request so the worker handler can detect and recover from state mismatches (e.g., after an unexpected worker restart). - Factory function:
CreateWebWorkerMLCEngineis an async factory that constructs the proxy, callsreload(), and returns the fully initialized engine -- equivalent tonew WebWorkerMLCEngine(worker).reload(modelId).
The proxy exposes all the same top-level API objects:
engine.chat--API.Chatinstance forchat.completions.create()engine.completions--API.Completionsinstance forcompletions.create()engine.embeddings--API.Embeddingsinstance forembeddings.create()
Usage
Use this pattern on the main thread to communicate with the worker-hosted engine. The API is identical to direct MLCEngine usage:
// This code works the same whether `engine` is MLCEngine or WebWorkerMLCEngine
const response = await engine.chat.completions.create({
messages: [{ role: "user", content: "Hello!" }],
model: "Llama-3.1-8B-Instruct-q4f16_1-MLC",
});
console.log(response.choices[0].message.content);
The proxy pattern is recommended when:
- Building user-facing applications that must stay responsive during inference
- The inference model is large enough that loading and running it would cause visible UI jank
- You want to use the standard OpenAI-compatible API without worrying about threading
The proxy pattern is not needed when:
- Running in Node.js or a non-browser environment
- You are already inside a worker and want a direct engine reference
- Building a minimal test or benchmark where main-thread blocking is acceptable
Theoretical Basis
The proxy pattern here is an implementation of the Remote Proxy design pattern, where a local object (the proxy) controls access to a remote object (the engine in the worker). Combined with the Promise-based RPC approach, it provides the illusion of synchronous method calls across an asynchronous message-passing boundary.
The communication flow for a non-streaming request:
Main Thread Web Worker
| |
| WorkerRequest{kind, uuid, content}|
|----------------------------------->|
| | engine.chatCompletion(request)
| |
| WorkerResponse{kind:"return", |
| uuid, content} |
|<-----------------------------------|
| |
Promise resolved with content
The communication flow for a streaming request:
Main Thread Web Worker
| |
| chatCompletionStreamInit |
|----------------------------------->|
| | Creates AsyncGenerator
| return (null) |
|<-----------------------------------|
| |
| completionStreamNextChunk |
|----------------------------------->|
| | generator.next()
| return (ChatCompletionChunk) |
|<-----------------------------------|
| |
| completionStreamNextChunk |
|----------------------------------->|
| | generator.next() -> done
| return (void) |
|<-----------------------------------|
| |
Proxy generator ends
The proxy's onmessage handler processes three kinds of responses:
"initProgressCallback"-- Forwards to the user's progress callback"return"-- Resolves the pending promise for the matching UUID"throw"-- Rejects the pending promise for the matching UUID
Related Pages
- Implementation:Mlc_ai_Web_llm_Create_Web_Worker_MLC_Engine -- Concrete implementation of this pattern
- Principle:Mlc_ai_Web_llm_Web_Worker_Engine_Handler -- The worker-side handler this proxy communicates with
- Principle:Mlc_ai_Web_llm_Cross_Thread_Request_Forwarding -- Request forwarding mechanism
- Principle:Mlc_ai_Web_llm_Cross_Thread_Streaming -- Streaming across the proxy boundary
- Principle:Mlc_ai_Web_llm_Multi_Model_Routing -- How multi-model routing works through the proxy