Principle:Mlc ai Web llm Extension Client Engine

Overview

Pattern for creating an engine proxy in a Chrome Extension popup that communicates with the background service worker via chrome.runtime.Port with automatic keep-alive heartbeats. The client engine implements the same MLCEngineInterface as the direct MLCEngine, allowing popup code to use the standard engine.chat.completions.create() API without awareness that inference runs in a separate service worker process.

Description

The extension client engine creates a transparent proxy that implements MLCEngineInterface on the popup side. The construction involves three layers:

1. Port Connection: The client connects to the background service worker via chrome.runtime.connect(), establishing a long-lived chrome.runtime.Port. The port name is always "web_llm_service_worker". An optional extensionId parameter allows connecting to a different extension's service worker (for cross-extension inference).

2. PortAdapter: The port is wrapped in a PortAdapter class that emulates the ChatWorker interface (the same interface that WebWorkerMLCEngine expects from a Worker). This adapter translates between:

port.postMessage(msg) for sending messages
port.onMessage.addListener() for receiving messages
A getter/setter for onmessage to match the Worker API pattern

3. WebWorkerMLCEngine inheritance: ServiceWorkerMLCEngine extends WebWorkerMLCEngine, passing the PortAdapter as the ChatWorker. This means all the message serialization, promise management, streaming chunk generation, and API surface (chat.completions, completions, embeddings) are inherited from the web worker engine.

4. Keep-alive heartbeat: A setInterval timer sends { kind: "keepAlive" } messages to the service worker at a configurable interval (default: 10 seconds). This prevents Chrome from killing the idle service worker, which would destroy the loaded model. The timer is cleared when the port disconnects.

5. Disconnect callback: An optional onDisconnect callback in the config fires when the port disconnects, allowing the popup to display a reconnection UI or clean up state.

The factory function CreateServiceWorkerMLCEngine (aliased as CreateExtensionServiceWorkerMLCEngine) is the recommended entry point. It constructs the engine, calls reload(modelId, chatOpts), and returns a ready-to-use engine instance. Since the service worker handler caches already-loaded models, this reload typically completes instantly on subsequent popup opens.

Usage

Use this in the popup script (or content script) of a Chrome extension to communicate with the background service worker that hosts the actual MLCEngine.

When to apply:

The popup or sidebar of a Chrome extension needs to call LLM inference APIs
The extension uses the service worker pattern for persistent model hosting
You want the OpenAI-compatible chat.completions.create() API in extension popup code

When not to apply:

Standard web applications (use CreateWebWorkerMLCEngine instead)
The background script itself (it uses ServiceWorkerMLCEngineHandler, not the client engine)
Extensions that run inference directly in the popup without a service worker

Typical usage flow:

Import CreateExtensionServiceWorkerMLCEngine from @mlc-ai/web-llm
Call it with a model ID and an optional progress callback
Use the returned engine with engine.chat.completions.create() for inference
The engine handles all communication with the service worker transparently

Theoretical Basis

The client engine pattern is an application of the Proxy design pattern: the popup-side engine object presents the same interface as the real engine but delegates all operations to the background service worker over a message channel.

The keep-alive mechanism addresses a specific Chrome Extension platform constraint: service workers are killed after ~30 seconds of inactivity, as documented in the Chrome service worker lifecycle documentation. For LLM inference, where model loading can take 30+ seconds, losing the service worker means losing the loaded model. The periodic heartbeat keeps the service worker alive as long as the popup (or any other client) is open.

The PortAdapter is an instance of the Adapter pattern: it adapts the chrome.runtime.Port interface to the ChatWorker interface that WebWorkerMLCEngine expects. This allows maximum code reuse between the web worker and extension architectures.

I/O Contract

Input to CreateServiceWorkerMLCEngine:

Parameter	Type	Required	Default	Description
`modelId`	string[]	Yes	-	Model ID(s) to load; must be in `prebuiltAppConfig` or `engineConfig.appConfig`
`engineConfig`	`ExtensionMLCEngineConfig`	No	`undefined`	Configuration including `initProgressCallback`, `extensionId`, `onDisconnect`
`chatOpts`	ChatOptions[]	No	`undefined`	Overrides for `mlc-chat-config.json` per model
`keepAliveMs`	`number`	No	`10000`	Heartbeat interval in milliseconds

Output: A Promise<ServiceWorkerMLCEngine> that resolves to an engine implementing MLCEngineInterface.

ExtensionMLCEngineConfig extends MLCEngineConfig with:

Field	Type	Description
`extensionId`	undefined	If set, connects to a different extension's service worker via `chrome.runtime.connect(extensionId, ...)`
`onDisconnect`	undefined	Callback invoked when the port disconnects (service worker killed or extension unloaded)

Usage Examples

Basic popup usage (from the repository example):

import {
  ChatCompletionMessageParam,
  CreateExtensionServiceWorkerMLCEngine,
  MLCEngineInterface,
  InitProgressReport,
} from "@mlc-ai/web-llm";

const initProgressCallback = (report: InitProgressReport) => {
  console.log(`Loading: ${(report.progress * 100).toFixed(0)}% - ${report.text}`);
  if (report.progress === 1.0) {
    console.log("Model loaded, ready for inference");
  }
};

const engine: MLCEngineInterface = await CreateExtensionServiceWorkerMLCEngine(
  "Qwen2-0.5B-Instruct-q4f16_1-MLC",
  { initProgressCallback: initProgressCallback },
);

// Use the standard OpenAI-compatible chat API
const chatHistory: ChatCompletionMessageParam[] = [];
chatHistory.push({ role: "user", content: "What is machine learning?" });

const completion = await engine.chat.completions.create({
  stream: true,
  messages: chatHistory,
});

let response = "";
for await (const chunk of completion) {
  const delta = chunk.choices[0].delta.content;
  if (delta) {
    response += delta;
    console.log(response);
  }
}
chatHistory.push({ role: "assistant", content: await engine.getMessage() });

With disconnect handling and custom keep-alive interval:

import { CreateExtensionServiceWorkerMLCEngine } from "@mlc-ai/web-llm";

const engine = await CreateExtensionServiceWorkerMLCEngine(
  "Llama-3.1-8B-Instruct-q4f32_1-MLC",
  {
    initProgressCallback: (report) => {
      document.getElementById("status")!.textContent = report.text;
    },
    onDisconnect: () => {
      document.getElementById("status")!.textContent =
        "Service worker disconnected. Please reopen the extension.";
    },
  },
  undefined,  // chatOpts
  5000,        // keepAliveMs: send heartbeat every 5 seconds
);

Cross-extension inference (connecting to another extension's service worker):

import { CreateExtensionServiceWorkerMLCEngine } from "@mlc-ai/web-llm";

// Connect to a different extension that hosts the LLM engine
const engine = await CreateExtensionServiceWorkerMLCEngine(
  "Qwen2-0.5B-Instruct-q4f16_1-MLC",
  {
    extensionId: "abcdefghijklmnopqrstuvwxyz",  // target extension ID
    initProgressCallback: (report) => {
      console.log(report.text);
    },
  },
);

Related Pages

Implementation:Mlc_ai_Web_llm_Create_Service_Worker_MLC_Engine
Mlc_ai_Web_llm_Extension_Service_Worker - The service worker handler that this client connects to
Mlc_ai_Web_llm_Chrome_Extension_Manifest - Manifest configuration that must be in place for port connections to work
Mlc_ai_Web_llm_Page_Content_Access - Page content can be used as context in chat completions sent through this engine
Heuristic:Mlc_ai_Web_llm_Service_Worker_Keep_Alive

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment