Principle:Mlc ai Web llm Extension Client Engine
Overview
Pattern for creating an engine proxy in a Chrome Extension popup that communicates with the background service worker via chrome.runtime.Port with automatic keep-alive heartbeats. The client engine implements the same MLCEngineInterface as the direct MLCEngine, allowing popup code to use the standard engine.chat.completions.create() API without awareness that inference runs in a separate service worker process.
Description
The extension client engine creates a transparent proxy that implements MLCEngineInterface on the popup side. The construction involves three layers:
1. Port Connection: The client connects to the background service worker via chrome.runtime.connect(), establishing a long-lived chrome.runtime.Port. The port name is always "web_llm_service_worker". An optional extensionId parameter allows connecting to a different extension's service worker (for cross-extension inference).
2. PortAdapter: The port is wrapped in a PortAdapter class that emulates the ChatWorker interface (the same interface that WebWorkerMLCEngine expects from a Worker). This adapter translates between:
port.postMessage(msg)for sending messagesport.onMessage.addListener()for receiving messages- A getter/setter for
onmessageto match the Worker API pattern
3. WebWorkerMLCEngine inheritance: ServiceWorkerMLCEngine extends WebWorkerMLCEngine, passing the PortAdapter as the ChatWorker. This means all the message serialization, promise management, streaming chunk generation, and API surface (chat.completions, completions, embeddings) are inherited from the web worker engine.
4. Keep-alive heartbeat: A setInterval timer sends { kind: "keepAlive" } messages to the service worker at a configurable interval (default: 10 seconds). This prevents Chrome from killing the idle service worker, which would destroy the loaded model. The timer is cleared when the port disconnects.
5. Disconnect callback: An optional onDisconnect callback in the config fires when the port disconnects, allowing the popup to display a reconnection UI or clean up state.
The factory function CreateServiceWorkerMLCEngine (aliased as CreateExtensionServiceWorkerMLCEngine) is the recommended entry point. It constructs the engine, calls reload(modelId, chatOpts), and returns a ready-to-use engine instance. Since the service worker handler caches already-loaded models, this reload typically completes instantly on subsequent popup opens.
Usage
Use this in the popup script (or content script) of a Chrome extension to communicate with the background service worker that hosts the actual MLCEngine.
When to apply:
- The popup or sidebar of a Chrome extension needs to call LLM inference APIs
- The extension uses the service worker pattern for persistent model hosting
- You want the OpenAI-compatible
chat.completions.create()API in extension popup code
When not to apply:
- Standard web applications (use
CreateWebWorkerMLCEngineinstead) - The background script itself (it uses
ServiceWorkerMLCEngineHandler, not the client engine) - Extensions that run inference directly in the popup without a service worker
Typical usage flow:
- Import
CreateExtensionServiceWorkerMLCEnginefrom@mlc-ai/web-llm - Call it with a model ID and an optional progress callback
- Use the returned engine with
engine.chat.completions.create()for inference - The engine handles all communication with the service worker transparently
Theoretical Basis
The client engine pattern is an application of the Proxy design pattern: the popup-side engine object presents the same interface as the real engine but delegates all operations to the background service worker over a message channel.
The keep-alive mechanism addresses a specific Chrome Extension platform constraint: service workers are killed after ~30 seconds of inactivity, as documented in the Chrome service worker lifecycle documentation. For LLM inference, where model loading can take 30+ seconds, losing the service worker means losing the loaded model. The periodic heartbeat keeps the service worker alive as long as the popup (or any other client) is open.
The PortAdapter is an instance of the Adapter pattern: it adapts the chrome.runtime.Port interface to the ChatWorker interface that WebWorkerMLCEngine expects. This allows maximum code reuse between the web worker and extension architectures.
I/O Contract
Input to CreateServiceWorkerMLCEngine:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
modelId |
string[] | Yes | - | Model ID(s) to load; must be in prebuiltAppConfig or engineConfig.appConfig
|
engineConfig |
ExtensionMLCEngineConfig |
No | undefined |
Configuration including initProgressCallback, extensionId, onDisconnect
|
chatOpts |
ChatOptions[] | No | undefined |
Overrides for mlc-chat-config.json per model
|
keepAliveMs |
number |
No | 10000 |
Heartbeat interval in milliseconds |
Output: A Promise<ServiceWorkerMLCEngine> that resolves to an engine implementing MLCEngineInterface.
ExtensionMLCEngineConfig extends MLCEngineConfig with:
| Field | Type | Description |
|---|---|---|
extensionId |
undefined | If set, connects to a different extension's service worker via chrome.runtime.connect(extensionId, ...)
|
onDisconnect |
undefined | Callback invoked when the port disconnects (service worker killed or extension unloaded) |
Usage Examples
Basic popup usage (from the repository example):
import {
ChatCompletionMessageParam,
CreateExtensionServiceWorkerMLCEngine,
MLCEngineInterface,
InitProgressReport,
} from "@mlc-ai/web-llm";
const initProgressCallback = (report: InitProgressReport) => {
console.log(`Loading: ${(report.progress * 100).toFixed(0)}% - ${report.text}`);
if (report.progress === 1.0) {
console.log("Model loaded, ready for inference");
}
};
const engine: MLCEngineInterface = await CreateExtensionServiceWorkerMLCEngine(
"Qwen2-0.5B-Instruct-q4f16_1-MLC",
{ initProgressCallback: initProgressCallback },
);
// Use the standard OpenAI-compatible chat API
const chatHistory: ChatCompletionMessageParam[] = [];
chatHistory.push({ role: "user", content: "What is machine learning?" });
const completion = await engine.chat.completions.create({
stream: true,
messages: chatHistory,
});
let response = "";
for await (const chunk of completion) {
const delta = chunk.choices[0].delta.content;
if (delta) {
response += delta;
console.log(response);
}
}
chatHistory.push({ role: "assistant", content: await engine.getMessage() });
With disconnect handling and custom keep-alive interval:
import { CreateExtensionServiceWorkerMLCEngine } from "@mlc-ai/web-llm";
const engine = await CreateExtensionServiceWorkerMLCEngine(
"Llama-3.1-8B-Instruct-q4f32_1-MLC",
{
initProgressCallback: (report) => {
document.getElementById("status")!.textContent = report.text;
},
onDisconnect: () => {
document.getElementById("status")!.textContent =
"Service worker disconnected. Please reopen the extension.";
},
},
undefined, // chatOpts
5000, // keepAliveMs: send heartbeat every 5 seconds
);
Cross-extension inference (connecting to another extension's service worker):
import { CreateExtensionServiceWorkerMLCEngine } from "@mlc-ai/web-llm";
// Connect to a different extension that hosts the LLM engine
const engine = await CreateExtensionServiceWorkerMLCEngine(
"Qwen2-0.5B-Instruct-q4f16_1-MLC",
{
extensionId: "abcdefghijklmnopqrstuvwxyz", // target extension ID
initProgressCallback: (report) => {
console.log(report.text);
},
},
);
Related Pages
- Implementation:Mlc_ai_Web_llm_Create_Service_Worker_MLC_Engine
- Mlc_ai_Web_llm_Extension_Service_Worker - The service worker handler that this client connects to
- Mlc_ai_Web_llm_Chrome_Extension_Manifest - Manifest configuration that must be in place for port connections to work
- Mlc_ai_Web_llm_Page_Content_Access - Page content can be used as context in chat completions sent through this engine
- Heuristic:Mlc_ai_Web_llm_Service_Worker_Keep_Alive