Implementation:Mlc ai Web llm Create Web Worker MLC Engine
Overview
CreateWebWorkerMLCEngine is the async factory function and WebWorkerMLCEngine is the proxy class provided by @mlc-ai/web-llm for interacting with an LLM engine running in a Web Worker. Together they implement the Web Worker Engine Proxy pattern, giving the main thread a fully transparent MLCEngineInterface-compatible object.
Description
The implementation consists of two parts:
1. CreateWebWorkerMLCEngine (factory function, L401-410): An async convenience function that constructs a WebWorkerMLCEngine and calls reload() on it to load the specified model. It is equivalent to manually constructing the engine and calling reload separately, but provides a cleaner one-liner API.
2. WebWorkerMLCEngine (class, L422-842): The main-thread proxy that implements MLCEngineInterface. It holds a reference to the ChatWorker (the Web Worker instance), manages a pendingPromise map for UUID-based RPC, and exposes the same API surface as MLCEngine:
chat: API.Chat-- Provideschat.completions.create()completions: API.Completions-- Providescompletions.create()embeddings: API.Embeddings-- Providesembeddings.create()
The proxy internally converts every API call into a WorkerRequest via the getPromise<T>(msg) helper, which:
- Stores a resolver callback keyed by the request's UUID
- Sends the message to the worker via
postMessage - Returns a
Promise<T>that resolves when the worker responds
The constructor also handles configuration forwarding: if engineConfig specifies appConfig, logLevel, or initProgressCallback, these are sent to the worker or stored locally. Notably, logitProcessorRegistry is not supported in the worker proxy (a warning is logged if provided) because logit processor functions cannot be serialized across thread boundaries.
Code Reference
Source: src/web_worker.ts, Lines 401-467 (factory + constructor), Lines 422-842 (full class)
export async function CreateWebWorkerMLCEngine(
worker: any,
modelId: string | string[],
engineConfig?: MLCEngineConfig,
chatOpts?: ChatOptions | ChatOptions[],
): Promise<WebWorkerMLCEngine> {
const webWorkerMLCEngine = new WebWorkerMLCEngine(worker, engineConfig);
await webWorkerMLCEngine.reload(modelId, chatOpts);
return webWorkerMLCEngine;
}
export class WebWorkerMLCEngine implements MLCEngineInterface {
public worker: ChatWorker;
public chat: API.Chat;
public completions: API.Completions;
public embeddings: API.Embeddings;
modelId?: string[];
chatOpts?: ChatOptions[];
private initProgressCallback?: InitProgressCallback;
private pendingPromise = new Map<string, (msg: WorkerResponse) => void>();
constructor(worker: ChatWorker, engineConfig?: MLCEngineConfig);
// Core RPC helper
protected getPromise<T extends MessageContent>(msg: WorkerRequest): Promise<T>;
// MLCEngineInterface methods
reload(modelId: string | string[], chatOpts?: ChatOptions | ChatOptions[]): Promise<void>;
chatCompletion(request: ChatCompletionRequest): Promise<...>;
completion(request: CompletionCreateParams): Promise<...>;
embedding(request: EmbeddingCreateParams): Promise<CreateEmbeddingResponse>;
getMessage(modelId?: string): Promise<string>;
runtimeStatsText(modelId?: string): Promise<string>;
interruptGenerate(): void;
unload(): Promise<void>;
resetChat(keepStats?: boolean, modelId?: string): Promise<void>;
forwardTokensAndSample(inputIds: Array<number>, isPrefill: boolean, modelId?: string): Promise<number>;
// Internal
onmessage(event: any): void;
async *asyncGenerate(selectedModelId: string): AsyncGenerator<...>;
}
I/O Contract
Inputs:
| Parameter | Type | Description |
|---|---|---|
worker |
any (ChatWorker) |
A Web Worker instance created with new Worker()
|
modelId |
string[] | One or more model IDs to load (must be in prebuiltAppConfig or engineConfig.appConfig)
|
engineConfig |
MLCEngineConfig (optional) |
Configuration including appConfig, logLevel, and initProgressCallback
|
chatOpts |
ChatOptions[] (optional) | Per-model overrides for mlc-chat-config.json
|
Output: A Promise<WebWorkerMLCEngine> that resolves once the model is loaded and the engine is ready for inference.
Error Conditions:
WorkerEngineModelNotLoadedError-- Thrown ifchatCompletion(),completion(), orembedding()is called beforereload()- Worker-side errors are propagated as rejected promises through the
"throw"response kind
Import
import { CreateWebWorkerMLCEngine, WebWorkerMLCEngine } from "@mlc-ai/web-llm";
Usage Examples
Basic usage with the factory function:
import { CreateWebWorkerMLCEngine } from "@mlc-ai/web-llm";
const worker = new Worker(
new URL("./worker.ts", import.meta.url),
{ type: "module" }
);
const engine = await CreateWebWorkerMLCEngine(
worker,
"Llama-3.1-8B-Instruct-q4f16_1-MLC",
{
initProgressCallback: (report) => {
console.log(`Loading: ${report.text}`);
},
}
);
// Use exactly the same API as MLCEngine
const reply = await engine.chat.completions.create({
messages: [{ role: "user", content: "What is WebGPU?" }],
});
console.log(reply.choices[0].message.content);
Loading multiple models:
const engine = await CreateWebWorkerMLCEngine(
worker,
["Llama-3.1-8B-Instruct-q4f16_1-MLC", "snowflake-arctic-embed-s-q0f32-MLC"]
);
// Chat completion uses the LLM
const chatReply = await engine.chat.completions.create({
messages: [{ role: "user", content: "Hello!" }],
model: "Llama-3.1-8B-Instruct-q4f16_1-MLC",
});
// Embedding uses the embedding model
const embedReply = await engine.embeddings.create({
input: "Hello world",
model: "snowflake-arctic-embed-s-q0f32-MLC",
});
Manual construction (without factory):
import { WebWorkerMLCEngine } from "@mlc-ai/web-llm";
const engine = new WebWorkerMLCEngine(worker, { logLevel: "INFO" });
await engine.reload("Llama-3.1-8B-Instruct-q4f16_1-MLC");
Related Pages
- Principle:Mlc_ai_Web_llm_Web_Worker_Engine_Proxy -- The principle this implements
- Implementation:Mlc_ai_Web_llm_Web_Worker_MLC_Engine_Handler -- The worker-side handler
- Implementation:Mlc_ai_Web_llm_Web_Worker_Chat_Completion -- Chat completion through the proxy
- Implementation:Mlc_ai_Web_llm_Async_Generate -- Streaming generator in the proxy