Principle:Mlc ai Web llm Web Worker Engine Proxy

Overview

Web Worker Engine Proxy is the architectural pattern for creating a transparent proxy on the main thread that exposes the same MLCEngineInterface as a direct MLCEngine, while delegating all actual computation to a Web Worker thread. This enables developers to write identical application code regardless of whether inference runs on the main thread or in a worker.

Description

The engine proxy pattern provides a seamless developer experience by implementing the MLCEngineInterface on the main thread. The proxy serializes method calls into WorkerRequest messages, sends them to the worker via postMessage, and deserializes WorkerResponse messages back into return values using promise-based RPC.

The key design decisions are:

Interface parity: The proxy class (WebWorkerMLCEngine) implements the same MLCEngineInterface as MLCEngine. This means engine.chat.completions.create(), engine.completions.create(), and engine.embeddings.create() work identically.
UUID-based correlation: Each outgoing request is tagged with a crypto.randomUUID(). The proxy stores a resolver callback in a pendingPromise map. When a response arrives with a matching UUID, the corresponding promise is resolved or rejected.
Init progress forwarding: The worker sends initProgressCallback messages during model loading. The proxy intercepts these and invokes the user-registered InitProgressCallback on the main thread.
State synchronization: The proxy maintains its own modelId and chatOpts arrays that mirror the worker-side state. These are sent with every inference request so the worker handler can detect and recover from state mismatches (e.g., after an unexpected worker restart).
Factory function: CreateWebWorkerMLCEngine is an async factory that constructs the proxy, calls reload(), and returns the fully initialized engine -- equivalent to new WebWorkerMLCEngine(worker).reload(modelId).

The proxy exposes all the same top-level API objects:

engine.chat -- API.Chat instance for chat.completions.create()
engine.completions -- API.Completions instance for completions.create()
engine.embeddings -- API.Embeddings instance for embeddings.create()

Usage

Use this pattern on the main thread to communicate with the worker-hosted engine. The API is identical to direct MLCEngine usage:

// This code works the same whether `engine` is MLCEngine or WebWorkerMLCEngine
const response = await engine.chat.completions.create({
  messages: [{ role: "user", content: "Hello!" }],
  model: "Llama-3.1-8B-Instruct-q4f16_1-MLC",
});
console.log(response.choices[0].message.content);

The proxy pattern is recommended when:

Building user-facing applications that must stay responsive during inference
The inference model is large enough that loading and running it would cause visible UI jank
You want to use the standard OpenAI-compatible API without worrying about threading

The proxy pattern is not needed when:

Running in Node.js or a non-browser environment
You are already inside a worker and want a direct engine reference
Building a minimal test or benchmark where main-thread blocking is acceptable

Theoretical Basis

The proxy pattern here is an implementation of the Remote Proxy design pattern, where a local object (the proxy) controls access to a remote object (the engine in the worker). Combined with the Promise-based RPC approach, it provides the illusion of synchronous method calls across an asynchronous message-passing boundary.

The communication flow for a non-streaming request:

Main Thread                          Web Worker
     |                                    |
     |  WorkerRequest{kind, uuid, content}|
     |----------------------------------->|
     |                                    | engine.chatCompletion(request)
     |                                    |
     |  WorkerResponse{kind:"return",     |
     |                 uuid, content}      |
     |<-----------------------------------|
     |                                    |
  Promise resolved with content

The communication flow for a streaming request:

Main Thread                          Web Worker
     |                                    |
     |  chatCompletionStreamInit          |
     |----------------------------------->|
     |                                    | Creates AsyncGenerator
     |  return (null)                     |
     |<-----------------------------------|
     |                                    |
     |  completionStreamNextChunk         |
     |----------------------------------->|
     |                                    | generator.next()
     |  return (ChatCompletionChunk)      |
     |<-----------------------------------|
     |                                    |
     |  completionStreamNextChunk         |
     |----------------------------------->|
     |                                    | generator.next() -> done
     |  return (void)                     |
     |<-----------------------------------|
     |                                    |
  Proxy generator ends

The proxy's onmessage handler processes three kinds of responses:

"initProgressCallback" -- Forwards to the user's progress callback
"return" -- Resolves the pending promise for the matching UUID
"throw" -- Rejects the pending promise for the matching UUID

Related Pages

Implementation:Mlc_ai_Web_llm_Create_Web_Worker_MLC_Engine -- Concrete implementation of this pattern
Principle:Mlc_ai_Web_llm_Web_Worker_Engine_Handler -- The worker-side handler this proxy communicates with
Principle:Mlc_ai_Web_llm_Cross_Thread_Request_Forwarding -- Request forwarding mechanism
Principle:Mlc_ai_Web_llm_Cross_Thread_Streaming -- Streaming across the proxy boundary
Principle:Mlc_ai_Web_llm_Multi_Model_Routing -- How multi-model routing works through the proxy

Implementation:Mlc_ai_Web_llm_Create_Web_Worker_MLC_Engine

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment