Implementation:Mlc ai Web llm Create Web Worker MLC Engine

Overview

CreateWebWorkerMLCEngine is the async factory function and WebWorkerMLCEngine is the proxy class provided by @mlc-ai/web-llm for interacting with an LLM engine running in a Web Worker. Together they implement the Web Worker Engine Proxy pattern, giving the main thread a fully transparent MLCEngineInterface-compatible object.

Description

The implementation consists of two parts:

1. CreateWebWorkerMLCEngine (factory function, L401-410): An async convenience function that constructs a WebWorkerMLCEngine and calls reload() on it to load the specified model. It is equivalent to manually constructing the engine and calling reload separately, but provides a cleaner one-liner API.

2. WebWorkerMLCEngine (class, L422-842): The main-thread proxy that implements MLCEngineInterface. It holds a reference to the ChatWorker (the Web Worker instance), manages a pendingPromise map for UUID-based RPC, and exposes the same API surface as MLCEngine:

chat: API.Chat -- Provides chat.completions.create()
completions: API.Completions -- Provides completions.create()
embeddings: API.Embeddings -- Provides embeddings.create()

The proxy internally converts every API call into a WorkerRequest via the getPromise<T>(msg) helper, which:

Stores a resolver callback keyed by the request's UUID
Sends the message to the worker via postMessage
Returns a Promise<T> that resolves when the worker responds

The constructor also handles configuration forwarding: if engineConfig specifies appConfig, logLevel, or initProgressCallback, these are sent to the worker or stored locally. Notably, logitProcessorRegistry is not supported in the worker proxy (a warning is logged if provided) because logit processor functions cannot be serialized across thread boundaries.

Code Reference

Source: src/web_worker.ts, Lines 401-467 (factory + constructor), Lines 422-842 (full class)

export async function CreateWebWorkerMLCEngine(
  worker: any,
  modelId: string | string[],
  engineConfig?: MLCEngineConfig,
  chatOpts?: ChatOptions | ChatOptions[],
): Promise<WebWorkerMLCEngine> {
  const webWorkerMLCEngine = new WebWorkerMLCEngine(worker, engineConfig);
  await webWorkerMLCEngine.reload(modelId, chatOpts);
  return webWorkerMLCEngine;
}

export class WebWorkerMLCEngine implements MLCEngineInterface {
  public worker: ChatWorker;
  public chat: API.Chat;
  public completions: API.Completions;
  public embeddings: API.Embeddings;

  modelId?: string[];
  chatOpts?: ChatOptions[];

  private initProgressCallback?: InitProgressCallback;
  private pendingPromise = new Map<string, (msg: WorkerResponse) => void>();

  constructor(worker: ChatWorker, engineConfig?: MLCEngineConfig);

  // Core RPC helper
  protected getPromise<T extends MessageContent>(msg: WorkerRequest): Promise<T>;

  // MLCEngineInterface methods
  reload(modelId: string | string[], chatOpts?: ChatOptions | ChatOptions[]): Promise<void>;
  chatCompletion(request: ChatCompletionRequest): Promise<...>;
  completion(request: CompletionCreateParams): Promise<...>;
  embedding(request: EmbeddingCreateParams): Promise<CreateEmbeddingResponse>;
  getMessage(modelId?: string): Promise<string>;
  runtimeStatsText(modelId?: string): Promise<string>;
  interruptGenerate(): void;
  unload(): Promise<void>;
  resetChat(keepStats?: boolean, modelId?: string): Promise<void>;
  forwardTokensAndSample(inputIds: Array<number>, isPrefill: boolean, modelId?: string): Promise<number>;

  // Internal
  onmessage(event: any): void;
  async *asyncGenerate(selectedModelId: string): AsyncGenerator<...>;
}

I/O Contract

Inputs:

Parameter	Type	Description
`worker`	`any` (ChatWorker)	A Web Worker instance created with `new Worker()`
`modelId`	string[]	One or more model IDs to load (must be in `prebuiltAppConfig` or `engineConfig.appConfig`)
`engineConfig`	`MLCEngineConfig` (optional)	Configuration including `appConfig`, `logLevel`, and `initProgressCallback`
`chatOpts`	ChatOptions[] (optional)	Per-model overrides for `mlc-chat-config.json`

Output: A Promise<WebWorkerMLCEngine> that resolves once the model is loaded and the engine is ready for inference.

Error Conditions:

WorkerEngineModelNotLoadedError -- Thrown if chatCompletion(), completion(), or embedding() is called before reload()
Worker-side errors are propagated as rejected promises through the "throw" response kind

Import

import { CreateWebWorkerMLCEngine, WebWorkerMLCEngine } from "@mlc-ai/web-llm";

Usage Examples

Basic usage with the factory function:

import { CreateWebWorkerMLCEngine } from "@mlc-ai/web-llm";

const worker = new Worker(
  new URL("./worker.ts", import.meta.url),
  { type: "module" }
);

const engine = await CreateWebWorkerMLCEngine(
  worker,
  "Llama-3.1-8B-Instruct-q4f16_1-MLC",
  {
    initProgressCallback: (report) => {
      console.log(`Loading: ${report.text}`);
    },
  }
);

// Use exactly the same API as MLCEngine
const reply = await engine.chat.completions.create({
  messages: [{ role: "user", content: "What is WebGPU?" }],
});
console.log(reply.choices[0].message.content);

Loading multiple models:

const engine = await CreateWebWorkerMLCEngine(
  worker,
  ["Llama-3.1-8B-Instruct-q4f16_1-MLC", "snowflake-arctic-embed-s-q0f32-MLC"]
);

// Chat completion uses the LLM
const chatReply = await engine.chat.completions.create({
  messages: [{ role: "user", content: "Hello!" }],
  model: "Llama-3.1-8B-Instruct-q4f16_1-MLC",
});

// Embedding uses the embedding model
const embedReply = await engine.embeddings.create({
  input: "Hello world",
  model: "snowflake-arctic-embed-s-q0f32-MLC",
});

Manual construction (without factory):

import { WebWorkerMLCEngine } from "@mlc-ai/web-llm";

const engine = new WebWorkerMLCEngine(worker, { logLevel: "INFO" });
await engine.reload("Llama-3.1-8B-Instruct-q4f16_1-MLC");

Related Pages

Principle:Mlc_ai_Web_llm_Web_Worker_Engine_Proxy -- The principle this implements
Implementation:Mlc_ai_Web_llm_Web_Worker_MLC_Engine_Handler -- The worker-side handler
Implementation:Mlc_ai_Web_llm_Web_Worker_Chat_Completion -- Chat completion through the proxy
Implementation:Mlc_ai_Web_llm_Async_Generate -- Streaming generator in the proxy

Principle:Mlc_ai_Web_llm_Web_Worker_Engine_Proxy

Environment:Mlc_ai_Web_llm_WebGPU_Browser_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment