Implementation:Mlc ai Web llm Chrome Tabs Connect

Overview

External tool documentation for the Chrome Extensions content script and messaging pattern used to extract web page content for LLM inference context. This implementation uses Chrome's chrome.tabs.connect() API, chrome.runtime.onConnect listener, and port-based messaging to transfer DOM text from web pages to the extension popup, where it can be injected into LLM prompts.

Description

This implementation documents three Chrome Extension APIs working together to enable page content extraction:

1. chrome.tabs.connect(tabId, connectInfo) - Called from the popup script to open a long-lived port connection to the content script running in the specified tab. Returns a chrome.runtime.Port.

2. chrome.runtime.onConnect - Listened to in the content script. Fires when the popup establishes a connection via chrome.tabs.connect(). Provides a chrome.runtime.Port for bidirectional messaging.

3. port.postMessage() / port.onMessage.addListener() - Used for the actual data exchange. The popup sends an empty trigger message; the content script responds with page text.

The repository provides two content script variants:

Service worker example: Extracts document.body.innerHTML (preserves HTML structure)
Non-service-worker example: Extracts document.body.innerText (plain text only)

Code Reference

Content Script (Service Worker Example)

Source: examples/chrome-extension-webgpu-service-worker/src/content.js (full file)

// Only the content script is able to access the DOM
chrome.runtime.onConnect.addListener(function (port) {
  port.onMessage.addListener(function (msg) {
    port.postMessage({ contents: document.body.innerHTML });
  });
});

Content Script (Non-Service-Worker Example)

Source: examples/chrome-extension/src/content.js (full file)

// Only the content script is able to access the DOM
chrome.runtime.onConnect.addListener(function (port) {
  port.onMessage.addListener(function (msg) {
    port.postMessage({ contents: document.body.innerText });
  });
});

Popup Script - fetchPageContents (Service Worker Example)

Source: examples/chrome-extension-webgpu-service-worker/src/popup.ts, Lines 149-160

function fetchPageContents() {
  chrome.tabs.query({ currentWindow: true, active: true }, function (tabs) {
    if (tabs[0]?.id) {
      const port = chrome.tabs.connect(tabs[0].id, { name: "channelName" });
      port.postMessage({});
      port.onMessage.addListener(function (msg) {
        console.log("Page contents:", msg.contents);
        chrome.runtime.sendMessage({ context: msg.contents });
      });
    }
  });
}

Popup Script - fetchPageContents (Non-Service-Worker Example)

Source: examples/chrome-extension/src/popup.ts, Lines 289-298

function fetchPageContents() {
  chrome.tabs.query({ currentWindow: true, active: true }, function (tabs) {
    const port = chrome.tabs.connect(tabs[0].id, { name: "channelName" });
    port.postMessage({});
    port.onMessage.addListener(function (msg) {
      console.log("Page contents:", msg.contents);
      context = msg.contents;
    });
  });
}

Context Injection into LLM Prompt (Non-Service-Worker Example)

Source: examples/chrome-extension/src/popup.ts, Lines 160-168

// Inside handleClick():
let inp = message;
if (context.length > 0) {
  inp =
    "Use only the following context when answering the question at the end. Don't use any other knowledge.\n" +
    context +
    "\n\nQuestion: " +
    message +
    "\n\nHelpful Answer: ";
}
chatHistory.push({ role: "user", content: inp });

I/O Contract

chrome.tabs.query()

Parameter	Type	Description
`queryInfo`	`{ currentWindow: true, active: true }`	Selects the currently active tab in the current window

Returns: Callback receives Tab[] where tabs[0].id is the active tab ID.

chrome.tabs.connect()

Parameter	Type	Description
`tabId`	`number`	The ID of the tab to connect to (from `tabs[0].id`)
`connectInfo`	`{ name: string }`	Port name identifier (e.g. `"channelName"`)

Returns: chrome.runtime.Port - a bidirectional communication channel with the content script.

Content Script Message Protocol

Direction	Message Format	Description
Popup -> Content Script	`{}` (empty object)	Trigger message requesting page content
Content Script -> Popup	`{ contents: string }`	Page text content (HTML or plain text)

Manifest Declaration

Field	Value	Description
`content_scripts[].matches`	`["<all_urls>"]`	URL patterns where the content script is injected
`content_scripts[].js`	`["content.js"]`	Path to the content script file
`permissions`	Must include `"tabs"`	Required for `chrome.tabs.connect()`

Usage Examples

Complete content script with error handling:

// content.js - Injected into web pages by Chrome
chrome.runtime.onConnect.addListener(function (port) {
  if (port.name === "channelName") {
    port.onMessage.addListener(function (msg) {
      try {
        // Extract plain text (preferred for LLM context)
        const pageText = document.body.innerText;
        port.postMessage({ contents: pageText });
      } catch (error) {
        port.postMessage({ contents: "", error: error.message });
      }
    });
  }
});

Popup script with conditional context usage:

// Whether or not to use the content from the active tab as the context
const useContext = false;

let pageContext = "";

function fetchPageContents() {
  chrome.tabs.query({ currentWindow: true, active: true }, function (tabs) {
    if (tabs[0]?.id) {
      const port = chrome.tabs.connect(tabs[0].id, { name: "channelName" });
      port.postMessage({});
      port.onMessage.addListener(function (msg) {
        console.log("Page contents:", msg.contents);
        pageContext = msg.contents;
      });
    }
  });
}

// Grab the page contents when the popup is opened
window.onload = function () {
  if (useContext) {
    fetchPageContents();
  }
};

Complete example: page summarization with web-llm:

import {
  CreateExtensionServiceWorkerMLCEngine,
  ChatCompletionMessageParam,
} from "@mlc-ai/web-llm";

// Step 1: Create engine
const engine = await CreateExtensionServiceWorkerMLCEngine(
  "Qwen2-0.5B-Instruct-q4f16_1-MLC",
  { initProgressCallback: (r) => console.log(r.text) },
);

// Step 2: Fetch page content
function getPageContent(): Promise<string> {
  return new Promise((resolve) => {
    chrome.tabs.query({ currentWindow: true, active: true }, function (tabs) {
      if (tabs[0]?.id) {
        const port = chrome.tabs.connect(tabs[0].id, { name: "channelName" });
        port.postMessage({});
        port.onMessage.addListener(function (msg) {
          resolve(msg.contents);
        });
      } else {
        resolve("");
      }
    });
  });
}

// Step 3: Summarize the page
async function summarizePage() {
  const pageContent = await getPageContent();
  if (!pageContent) {
    console.log("No page content available");
    return;
  }

  // Truncate if too long for the model's context window
  const truncated = pageContent.substring(0, 4000);

  const messages: ChatCompletionMessageParam[] = [
    {
      role: "system",
      content: "You are a helpful assistant that summarizes web pages concisely.",
    },
    {
      role: "user",
      content: "Please summarize the following web page content:\n\n" + truncated,
    },
  ];

  const completion = await engine.chat.completions.create({
    stream: true,
    messages: messages,
  });

  let summary = "";
  for await (const chunk of completion) {
    const delta = chunk.choices[0].delta.content;
    if (delta) summary += delta;
  }
  console.log("Summary:", summary);
}

External Dependencies

API	Chrome Version	Documentation
`chrome.tabs.connect()`	Chrome 26+	chrome.tabs.connect
`chrome.tabs.query()`	Chrome 16+	chrome.tabs.query
`chrome.runtime.onConnect`	Chrome 26+	chrome.runtime.onConnect
`chrome.runtime.Port`	Chrome 26+	chrome.runtime.Port
Content Scripts API	Chrome 88+ (MV3)	Content Scripts

Known Limitations

No chunking: The content script sends the entire page text in one message. For very large pages, this may exceed message size limits or the model's context window.
No filtering: The raw innerText or innerHTML includes navigation, headers, footers, and other non-content elements. A production extension would benefit from content extraction heuristics.
Runtime errors on special pages: Content scripts cannot be injected into chrome:// pages, chrome-extension:// pages, or the Chrome Web Store. Attempting chrome.tabs.connect() on these pages throws runtime.lastError.
Timing dependency: If the popup calls fetchPageContents() before the content script has loaded in the tab (e.g., on a freshly navigated page), the connection may fail silently.

Related Pages

Principle:Mlc_ai_Web_llm_Page_Content_Access
Mlc_ai_Web_llm_Manifest_V3_Configuration - Manifest where content scripts and permissions are declared
Mlc_ai_Web_llm_Create_Service_Worker_MLC_Engine - Engine factory used in the popup alongside page content extraction
Mlc_ai_Web_llm_Chrome_Extension_Manifest - Principle for manifest configuration including content script declaration
Environment:Mlc_ai_Web_llm_Chrome_Extension_Manifest_V3

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment