Implementation:Mlc ai Web llm Web Worker MLC Engine Handler

Overview

WebWorkerMLCEngineHandler is the concrete class provided by @mlc-ai/web-llm for handling LLM inference requests inside a Web Worker. It creates an internal MLCEngine, sets up an InitProgressCallback that forwards loading progress via postMessage, and provides an onmessage handler that routes incoming WorkerRequest messages to the appropriate engine methods.

Description

The WebWorkerMLCEngineHandler class is designed to run inside a Web Worker script. On construction, it instantiates a private MLCEngine and registers an init progress callback that serializes InitProgressReport objects back to the main thread. The core routing logic lives in the onmessage method, which uses a switch statement on msg.kind to dispatch to engine methods.

Key internal state:

engine: MLCEngine -- The actual inference engine running in the worker
modelId?: string[] -- Currently loaded model IDs (kept in sync with the engine)
chatOpts?: ChatOptions[] -- Chat options for each loaded model
loadedModelIdToAsyncGenerator: Map<string, AsyncGenerator> -- Per-model streaming generators

The handler also implements a reloadIfUnmatched guard: before processing inference requests, it checks whether the expected model (sent by the main-thread proxy) matches the actually loaded model. If not (e.g., due to an unexpectedly killed service worker), it automatically reloads the correct model. This provides resilience against browser-level worker lifecycle issues.

Code Reference

Source: src/web_worker.ts, Lines 61-378

export class WebWorkerMLCEngineHandler {
  modelId?: string[];
  chatOpts?: ChatOptions[];
  public engine: MLCEngine;
  protected loadedModelIdToAsyncGenerator: Map<
    string,
    AsyncGenerator<ChatCompletionChunk | Completion, void, void>
  >;

  constructor();
  postMessage(msg: any): void;
  setLogitProcessorRegistry(logitProcessorRegistry?: Map<string, LogitProcessor>): void;
  handleTask<T extends MessageContent>(uuid: string, task: () => Promise<T>): Promise<void>;
  onmessage(event: any, onComplete?: (value: any) => void, onError?: () => void): void;
  reloadIfUnmatched(expectedModelId: string[], expectedChatOpts?: ChatOptions[]): Promise<void>;
}

I/O Contract

Input: WorkerRequest messages received via the Web Worker's onmessage event. Each request has:

kind -- A RequestKind string identifying the operation
uuid -- A unique identifier for correlating requests with responses
content -- A MessageContent union type carrying operation-specific parameters

Output: WorkerResponse messages sent via postMessage. Each response has:

kind -- Either "return" (success), "throw" (error), or "initProgressCallback" (progress)
uuid -- Matches the originating request's uuid
content -- The result value, error string, or InitProgressReport

Error Handling: All errors thrown during task execution are caught in handleTask and serialized as "throw"-kind responses with the error stringified as content. Unknown message kinds throw an UnknownMessageKindError.

Import

import { WebWorkerMLCEngineHandler } from "@mlc-ai/web-llm";

Usage Examples

Setting up a worker script (worker.ts):

// worker.ts
import { WebWorkerMLCEngineHandler } from "@mlc-ai/web-llm";

const handler = new WebWorkerMLCEngineHandler();

// Route all incoming messages to the handler
self.onmessage = (msg: MessageEvent) => {
  handler.onmessage(msg);
};

Setting up a worker with a custom logit processor:

// worker.ts
import { WebWorkerMLCEngineHandler, LogitProcessor } from "@mlc-ai/web-llm";

const myLogitProcessor: LogitProcessor = {
  processLogits: (logits: Float32Array) => {
    // Custom logit processing logic
    return logits;
  },
  processSampledToken: (token: number) => { /* update state */ },
  resetState: () => { /* reset internal state */ },
};

const handler = new WebWorkerMLCEngineHandler();
handler.setLogitProcessorRegistry(
  new Map([["my-model-id", myLogitProcessor]])
);

self.onmessage = (msg: MessageEvent) => {
  handler.onmessage(msg);
};

Corresponding main thread code:

// main.ts
import { CreateWebWorkerMLCEngine } from "@mlc-ai/web-llm";

const worker = new Worker(
  new URL("./worker.ts", import.meta.url),
  { type: "module" }
);

const engine = await CreateWebWorkerMLCEngine(
  worker,
  "Llama-3.1-8B-Instruct-q4f16_1-MLC",
  {
    initProgressCallback: (progress) => {
      console.log(`Loading: ${(progress.progress * 100).toFixed(1)}%`);
    },
  }
);

Related Pages

Principle:Mlc_ai_Web_llm_Web_Worker_Engine_Handler -- The principle this implements
Implementation:Mlc_ai_Web_llm_Create_Web_Worker_MLC_Engine -- The factory function that creates the main-thread proxy
Implementation:Mlc_ai_Web_llm_Web_Worker_Chat_Completion -- Chat completion forwarding
Implementation:Mlc_ai_Web_llm_Async_Generate -- Streaming generator implementation

Principle:Mlc_ai_Web_llm_Web_Worker_Engine_Handler

Environment:Mlc_ai_Web_llm_WebGPU_Browser_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment