Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Mlc ai Web llm Create Web Worker MLC Engine

From Leeroopedia

Template:Metadata

Overview

CreateWebWorkerMLCEngine is the async factory function and WebWorkerMLCEngine is the proxy class provided by @mlc-ai/web-llm for interacting with an LLM engine running in a Web Worker. Together they implement the Web Worker Engine Proxy pattern, giving the main thread a fully transparent MLCEngineInterface-compatible object.

Description

The implementation consists of two parts:

1. CreateWebWorkerMLCEngine (factory function, L401-410): An async convenience function that constructs a WebWorkerMLCEngine and calls reload() on it to load the specified model. It is equivalent to manually constructing the engine and calling reload separately, but provides a cleaner one-liner API.

2. WebWorkerMLCEngine (class, L422-842): The main-thread proxy that implements MLCEngineInterface. It holds a reference to the ChatWorker (the Web Worker instance), manages a pendingPromise map for UUID-based RPC, and exposes the same API surface as MLCEngine:

  • chat: API.Chat -- Provides chat.completions.create()
  • completions: API.Completions -- Provides completions.create()
  • embeddings: API.Embeddings -- Provides embeddings.create()

The proxy internally converts every API call into a WorkerRequest via the getPromise<T>(msg) helper, which:

  1. Stores a resolver callback keyed by the request's UUID
  2. Sends the message to the worker via postMessage
  3. Returns a Promise<T> that resolves when the worker responds

The constructor also handles configuration forwarding: if engineConfig specifies appConfig, logLevel, or initProgressCallback, these are sent to the worker or stored locally. Notably, logitProcessorRegistry is not supported in the worker proxy (a warning is logged if provided) because logit processor functions cannot be serialized across thread boundaries.

Code Reference

Source: src/web_worker.ts, Lines 401-467 (factory + constructor), Lines 422-842 (full class)

export async function CreateWebWorkerMLCEngine(
  worker: any,
  modelId: string | string[],
  engineConfig?: MLCEngineConfig,
  chatOpts?: ChatOptions | ChatOptions[],
): Promise<WebWorkerMLCEngine> {
  const webWorkerMLCEngine = new WebWorkerMLCEngine(worker, engineConfig);
  await webWorkerMLCEngine.reload(modelId, chatOpts);
  return webWorkerMLCEngine;
}

export class WebWorkerMLCEngine implements MLCEngineInterface {
  public worker: ChatWorker;
  public chat: API.Chat;
  public completions: API.Completions;
  public embeddings: API.Embeddings;

  modelId?: string[];
  chatOpts?: ChatOptions[];

  private initProgressCallback?: InitProgressCallback;
  private pendingPromise = new Map<string, (msg: WorkerResponse) => void>();

  constructor(worker: ChatWorker, engineConfig?: MLCEngineConfig);

  // Core RPC helper
  protected getPromise<T extends MessageContent>(msg: WorkerRequest): Promise<T>;

  // MLCEngineInterface methods
  reload(modelId: string | string[], chatOpts?: ChatOptions | ChatOptions[]): Promise<void>;
  chatCompletion(request: ChatCompletionRequest): Promise<...>;
  completion(request: CompletionCreateParams): Promise<...>;
  embedding(request: EmbeddingCreateParams): Promise<CreateEmbeddingResponse>;
  getMessage(modelId?: string): Promise<string>;
  runtimeStatsText(modelId?: string): Promise<string>;
  interruptGenerate(): void;
  unload(): Promise<void>;
  resetChat(keepStats?: boolean, modelId?: string): Promise<void>;
  forwardTokensAndSample(inputIds: Array<number>, isPrefill: boolean, modelId?: string): Promise<number>;

  // Internal
  onmessage(event: any): void;
  async *asyncGenerate(selectedModelId: string): AsyncGenerator<...>;
}

I/O Contract

Inputs:

Parameter Type Description
worker any (ChatWorker) A Web Worker instance created with new Worker()
modelId string[] One or more model IDs to load (must be in prebuiltAppConfig or engineConfig.appConfig)
engineConfig MLCEngineConfig (optional) Configuration including appConfig, logLevel, and initProgressCallback
chatOpts ChatOptions[] (optional) Per-model overrides for mlc-chat-config.json

Output: A Promise<WebWorkerMLCEngine> that resolves once the model is loaded and the engine is ready for inference.

Error Conditions:

  • WorkerEngineModelNotLoadedError -- Thrown if chatCompletion(), completion(), or embedding() is called before reload()
  • Worker-side errors are propagated as rejected promises through the "throw" response kind

Import

import { CreateWebWorkerMLCEngine, WebWorkerMLCEngine } from "@mlc-ai/web-llm";

Usage Examples

Basic usage with the factory function:

import { CreateWebWorkerMLCEngine } from "@mlc-ai/web-llm";

const worker = new Worker(
  new URL("./worker.ts", import.meta.url),
  { type: "module" }
);

const engine = await CreateWebWorkerMLCEngine(
  worker,
  "Llama-3.1-8B-Instruct-q4f16_1-MLC",
  {
    initProgressCallback: (report) => {
      console.log(`Loading: ${report.text}`);
    },
  }
);

// Use exactly the same API as MLCEngine
const reply = await engine.chat.completions.create({
  messages: [{ role: "user", content: "What is WebGPU?" }],
});
console.log(reply.choices[0].message.content);

Loading multiple models:

const engine = await CreateWebWorkerMLCEngine(
  worker,
  ["Llama-3.1-8B-Instruct-q4f16_1-MLC", "snowflake-arctic-embed-s-q0f32-MLC"]
);

// Chat completion uses the LLM
const chatReply = await engine.chat.completions.create({
  messages: [{ role: "user", content: "Hello!" }],
  model: "Llama-3.1-8B-Instruct-q4f16_1-MLC",
});

// Embedding uses the embedding model
const embedReply = await engine.embeddings.create({
  input: "Hello world",
  model: "snowflake-arctic-embed-s-q0f32-MLC",
});

Manual construction (without factory):

import { WebWorkerMLCEngine } from "@mlc-ai/web-llm";

const engine = new WebWorkerMLCEngine(worker, { logLevel: "INFO" });
await engine.reload("Llama-3.1-8B-Instruct-q4f16_1-MLC");

Related Pages

Principle:Mlc_ai_Web_llm_Web_Worker_Engine_Proxy

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment