Workflow:Mlc ai Web llm Chrome Extension Integration

Knowledge Sources	web-llm WebLLM Docs
Domains	LLMs, WebGPU, Chrome_Extensions, Service_Workers
Last Updated	2026-02-14 22:00 GMT

Overview

End-to-end process for building a Chrome browser extension powered by an in-browser LLM, using web-llm's Extension Service Worker engine for persistent background inference.

Description

This workflow demonstrates how to integrate web-llm into a Chrome Extension using Manifest V3. The LLM runs inside the extension's service worker, providing persistent background inference capability that survives popup closes and page navigations. The architecture uses ExtensionServiceWorkerMLCEngineHandler in the background script and CreateExtensionServiceWorkerMLCEngine in the popup or content script, communicating via Chrome's runtime messaging ports (chrome.runtime.Port). A keep-alive heartbeat mechanism prevents the service worker from being terminated by Chrome's lifecycle management.

Usage

Execute this workflow when building a Chrome extension that needs local LLM capabilities, such as a chatbot popup, content summarizer, writing assistant, or any extension feature that benefits from on-device language model inference. This approach keeps all data on the user's device and works without external API calls.

Execution Steps

Step 1: Configure the Extension Manifest

Create a Manifest V3 configuration file (manifest.json) that declares the required permissions and resources. The manifest must specify a service worker background script, a popup HTML file, and the necessary content security policy (CSP) for WebGPU and WebAssembly execution. The permissions should include storage for model caching.

Key considerations:

Use Manifest V3 format (manifest_version: 3)
The background script must be declared as a service_worker with type: "module"
Content Security Policy must allow WebAssembly execution (wasm-unsafe-eval)
CSP must allow connections to model hosting domains (e.g., huggingface.co, raw.githubusercontent.com)
The popup HTML file is declared under action.default_popup

Step 2: Create the Background Service Worker

Implement the background script that initializes the ExtensionServiceWorkerMLCEngineHandler. This handler manages the MLCEngine instance within the service worker context and handles message routing from popup and content scripts via Chrome runtime ports. The handler automatically manages the keep-alive heartbeat to prevent Chrome from terminating the service worker.

What happens:

The handler wraps an internal MLCEngine instance
It listens for chrome.runtime.connect events to accept port connections
Incoming messages are routed to the appropriate engine methods
A PortAdapter translates between Chrome port messaging and the standard Worker message interface
The keep-alive mechanism sends periodic heartbeat messages to prevent service worker termination

Step 3: Build the Popup UI

Create the popup HTML and script that serves as the user interface. The popup script uses CreateExtensionServiceWorkerMLCEngine to connect to the background service worker engine. This factory establishes a chrome.runtime.Port connection and returns an MLCEngineInterface proxy that transparently forwards API calls to the background worker.

Key considerations:

The popup creates the engine connection on load using CreateExtensionServiceWorkerMLCEngine
A progress callback can display model loading status in the popup UI
Chat history should be maintained in the popup script since the popup recreates on each open
The streaming API works identically to other deployment modes

Step 4: Handle Chat Interaction

Implement the chat loop in the popup: capture user input, send it to the engine via engine.chat.completions.create() with streaming enabled, and update the popup UI as response chunks arrive. Maintain a chat history array and append both user and assistant messages for multi-turn conversations.

What happens:

User input is captured from the popup's text field
The message is added to the chat history array
A streaming chat completion request is sent to the engine
Response chunks are iterated and displayed incrementally in the popup
The complete assistant response is appended to the chat history
The engine persists in the background service worker between popup opens

Step 5: Optionally Access Page Content

If the extension needs to process content from the active tab (e.g., for summarization), use Chrome's tabs API to establish a port connection with a content script. The content script extracts page content and sends it to the popup or background script, which can then include it in the LLM prompt as context.

Key considerations:

Content scripts run in the context of web pages and can access DOM content
Communication between content script and popup uses chrome.tabs.connect
Page content can be injected into the system prompt or user message for context-aware responses
Permissions for activeTab or specific host patterns may be required in the manifest

Execution Diagram

GitHub URL

Workflow Repository