Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Mlc ai Web llm Extension Client Engine

From Leeroopedia

Template:Metadata

Overview

Pattern for creating an engine proxy in a Chrome Extension popup that communicates with the background service worker via chrome.runtime.Port with automatic keep-alive heartbeats. The client engine implements the same MLCEngineInterface as the direct MLCEngine, allowing popup code to use the standard engine.chat.completions.create() API without awareness that inference runs in a separate service worker process.

Description

The extension client engine creates a transparent proxy that implements MLCEngineInterface on the popup side. The construction involves three layers:

1. Port Connection: The client connects to the background service worker via chrome.runtime.connect(), establishing a long-lived chrome.runtime.Port. The port name is always "web_llm_service_worker". An optional extensionId parameter allows connecting to a different extension's service worker (for cross-extension inference).

2. PortAdapter: The port is wrapped in a PortAdapter class that emulates the ChatWorker interface (the same interface that WebWorkerMLCEngine expects from a Worker). This adapter translates between:

  • port.postMessage(msg) for sending messages
  • port.onMessage.addListener() for receiving messages
  • A getter/setter for onmessage to match the Worker API pattern

3. WebWorkerMLCEngine inheritance: ServiceWorkerMLCEngine extends WebWorkerMLCEngine, passing the PortAdapter as the ChatWorker. This means all the message serialization, promise management, streaming chunk generation, and API surface (chat.completions, completions, embeddings) are inherited from the web worker engine.

4. Keep-alive heartbeat: A setInterval timer sends { kind: "keepAlive" } messages to the service worker at a configurable interval (default: 10 seconds). This prevents Chrome from killing the idle service worker, which would destroy the loaded model. The timer is cleared when the port disconnects.

5. Disconnect callback: An optional onDisconnect callback in the config fires when the port disconnects, allowing the popup to display a reconnection UI or clean up state.

The factory function CreateServiceWorkerMLCEngine (aliased as CreateExtensionServiceWorkerMLCEngine) is the recommended entry point. It constructs the engine, calls reload(modelId, chatOpts), and returns a ready-to-use engine instance. Since the service worker handler caches already-loaded models, this reload typically completes instantly on subsequent popup opens.

Usage

Use this in the popup script (or content script) of a Chrome extension to communicate with the background service worker that hosts the actual MLCEngine.

When to apply:

  • The popup or sidebar of a Chrome extension needs to call LLM inference APIs
  • The extension uses the service worker pattern for persistent model hosting
  • You want the OpenAI-compatible chat.completions.create() API in extension popup code

When not to apply:

  • Standard web applications (use CreateWebWorkerMLCEngine instead)
  • The background script itself (it uses ServiceWorkerMLCEngineHandler, not the client engine)
  • Extensions that run inference directly in the popup without a service worker

Typical usage flow:

  1. Import CreateExtensionServiceWorkerMLCEngine from @mlc-ai/web-llm
  2. Call it with a model ID and an optional progress callback
  3. Use the returned engine with engine.chat.completions.create() for inference
  4. The engine handles all communication with the service worker transparently

Theoretical Basis

The client engine pattern is an application of the Proxy design pattern: the popup-side engine object presents the same interface as the real engine but delegates all operations to the background service worker over a message channel.

The keep-alive mechanism addresses a specific Chrome Extension platform constraint: service workers are killed after ~30 seconds of inactivity, as documented in the Chrome service worker lifecycle documentation. For LLM inference, where model loading can take 30+ seconds, losing the service worker means losing the loaded model. The periodic heartbeat keeps the service worker alive as long as the popup (or any other client) is open.

The PortAdapter is an instance of the Adapter pattern: it adapts the chrome.runtime.Port interface to the ChatWorker interface that WebWorkerMLCEngine expects. This allows maximum code reuse between the web worker and extension architectures.

I/O Contract

Input to CreateServiceWorkerMLCEngine:

Parameter Type Required Default Description
modelId string[] Yes - Model ID(s) to load; must be in prebuiltAppConfig or engineConfig.appConfig
engineConfig ExtensionMLCEngineConfig No undefined Configuration including initProgressCallback, extensionId, onDisconnect
chatOpts ChatOptions[] No undefined Overrides for mlc-chat-config.json per model
keepAliveMs number No 10000 Heartbeat interval in milliseconds

Output: A Promise<ServiceWorkerMLCEngine> that resolves to an engine implementing MLCEngineInterface.

ExtensionMLCEngineConfig extends MLCEngineConfig with:

Field Type Description
extensionId undefined If set, connects to a different extension's service worker via chrome.runtime.connect(extensionId, ...)
onDisconnect undefined Callback invoked when the port disconnects (service worker killed or extension unloaded)

Usage Examples

Basic popup usage (from the repository example):

import {
  ChatCompletionMessageParam,
  CreateExtensionServiceWorkerMLCEngine,
  MLCEngineInterface,
  InitProgressReport,
} from "@mlc-ai/web-llm";

const initProgressCallback = (report: InitProgressReport) => {
  console.log(`Loading: ${(report.progress * 100).toFixed(0)}% - ${report.text}`);
  if (report.progress === 1.0) {
    console.log("Model loaded, ready for inference");
  }
};

const engine: MLCEngineInterface = await CreateExtensionServiceWorkerMLCEngine(
  "Qwen2-0.5B-Instruct-q4f16_1-MLC",
  { initProgressCallback: initProgressCallback },
);

// Use the standard OpenAI-compatible chat API
const chatHistory: ChatCompletionMessageParam[] = [];
chatHistory.push({ role: "user", content: "What is machine learning?" });

const completion = await engine.chat.completions.create({
  stream: true,
  messages: chatHistory,
});

let response = "";
for await (const chunk of completion) {
  const delta = chunk.choices[0].delta.content;
  if (delta) {
    response += delta;
    console.log(response);
  }
}
chatHistory.push({ role: "assistant", content: await engine.getMessage() });

With disconnect handling and custom keep-alive interval:

import { CreateExtensionServiceWorkerMLCEngine } from "@mlc-ai/web-llm";

const engine = await CreateExtensionServiceWorkerMLCEngine(
  "Llama-3.1-8B-Instruct-q4f32_1-MLC",
  {
    initProgressCallback: (report) => {
      document.getElementById("status")!.textContent = report.text;
    },
    onDisconnect: () => {
      document.getElementById("status")!.textContent =
        "Service worker disconnected. Please reopen the extension.";
    },
  },
  undefined,  // chatOpts
  5000,        // keepAliveMs: send heartbeat every 5 seconds
);

Cross-extension inference (connecting to another extension's service worker):

import { CreateExtensionServiceWorkerMLCEngine } from "@mlc-ai/web-llm";

// Connect to a different extension that hosts the LLM engine
const engine = await CreateExtensionServiceWorkerMLCEngine(
  "Qwen2-0.5B-Instruct-q4f16_1-MLC",
  {
    extensionId: "abcdefghijklmnopqrstuvwxyz",  // target extension ID
    initProgressCallback: (report) => {
      console.log(report.text);
    },
  },
);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment