Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Mlc ai Web llm WebGPU Browser Runtime

From Leeroopedia
Knowledge Sources
Domains Infrastructure, WebGPU, Browser
Last Updated 2026-02-14 22:00 GMT

Overview

Browser environment with WebGPU support, Cache/IndexedDB storage, and Web Worker or Service Worker execution context for running MLC-compiled LLM inference in the browser.

Description

This environment defines the runtime requirements for the WebLLM library, which performs hardware-accelerated large language model inference entirely in the browser via WebGPU. The library requires a WebGPU-enabled browser with sufficient GPU VRAM, the Cache API or IndexedDB for model weight caching, and support for Web Workers or Service Workers for off-main-thread execution. Models compiled with f16 quantization additionally require the shader-f16 WebGPU extension.

Usage

Use this environment for any workflow that loads and runs an MLC-compiled language model in the browser. This is the mandatory prerequisite for all WebLLM implementations including Create_MLC_Engine, Chat_Completions_Create, Embeddings_Create, and all worker-based engine variants.

System Requirements

Category Requirement Notes
Browser WebGPU-enabled browser Chrome 113+, Edge 113+, or other Chromium-based browsers with WebGPU flag enabled
GPU WebGPU-compatible GPU VRAM requirements vary by model (879 MB to 6100+ MB)
GPU Feature `shader-f16` extension (for f16 models) 20+ models require this; launch Chrome Canary with `--enable-dawn-features=allow_unsafe_apis` if missing
GPU Buffer `maxStorageBufferBindingSize` >= 1 GB Mobile devices with < 1 GB limit are restricted to `-1k` context window model variants
Storage Cache API or IndexedDB For caching model weights, tokenizer, WASM libraries, and config
Context Web page, Web Worker, or Service Worker Must have access to `document.URL` or `globalThis.location.origin`
Protocol HTTPS (for Service Workers) Service Worker API requires secure context

Dependencies

System Packages

  • WebGPU-enabled browser (Chrome 113+, Edge 113+)
  • GPU drivers compatible with WebGPU (Vulkan, Metal, or D3D12 backend)

JavaScript Packages

  • `@mlc-ai/web-llm` >= 0.2.80
  • `@mlc-ai/web-runtime` >= 0.24.0-dev1 (bundled, provides TVM WebGPU runtime)
  • `@mlc-ai/web-tokenizers` >= 0.1.6 (bundled, provides tokenizer support)
  • `@mlc-ai/web-xgrammar` = 0.1.27 (bundled, provides grammar-constrained decoding)
  • `loglevel` >= 1.9.1

Credentials

No API keys or credentials are required for basic WebLLM usage. Model weights are fetched from public HuggingFace repositories:

Quick Install

# Install via npm
npm install @mlc-ai/web-llm
<!-- Or via CDN in HTML -->
<script type="module">
  import * as webllm from "https://esm.run/@mlc-ai/web-llm";
</script>

Code Evidence

WebGPU availability check from `src/engine.ts:324-328`:

const gpuDetectOutput = await tvmjs.detectGPUDevice();
if (gpuDetectOutput == undefined) {
  throw new WebGPUNotAvailableError();
}

WebGPU feature requirement check from `src/engine.ts:335-344`:

if (modelRecord.required_features !== undefined) {
  for (const feature of modelRecord.required_features) {
    if (!gpuDetectOutput.device.features.has(feature)) {
      if (feature == "shader-f16") {
        throw new ShaderF16SupportError();
      }
      throw new FeatureSupportError(feature);
    }
  }
}

maxStorageBufferBindingSize check from `src/engine.ts:1136-1162`:

const maxStorageBufferBindingSize =
  gpuDetectOutput.device.limits.maxStorageBufferBindingSize;
const defaultMaxStorageBufferBindingSize = 1 << 30; // 1GB
if (maxStorageBufferBindingSize < defaultMaxStorageBufferBindingSize) {
  log.warn(
    `WARNING: the current maxStorageBufferBindingSize ` +
      `(${computeMB(maxStorageBufferBindingSize)}) ` +
      `may only work for a limited number of models, e.g.: \n` +
      `- Llama-3.1-8B-Instruct-q4f16_1-MLC-1k \n` +
      `- TinyLlama-1.1B-Chat-v0.4-q4f16_1-MLC-1k`,
  );
}

Service Worker API check from `src/service_worker.ts:222-224`:

if (!("serviceWorker" in navigator)) {
  throw new NoServiceWorkerAPIError();
}

Common Errors

Error Message Cause Solution
`WebGPU is not supported in your current environment` Browser does not have WebGPU enabled Use Chrome 113+ or enable WebGPU flag in browser settings. Visit https://webgpureport.org/ to verify
`This model requires WebGPU extension shader-f16` Browser lacks shader-f16 support Launch Chrome Canary with `--enable-dawn-features=allow_unsafe_apis`, or choose a q4f32 model variant instead
`Device was lost...insufficient memory or other GPU constraints` GPU ran out of VRAM during model loading Reload with a smaller model or reduce `context_window_size` via `ModelRecord.overrides`
`Service worker API is not available` Not in HTTPS context or browser lacks SW support Ensure page is served over HTTPS; check browser Service Worker support
`Missing model_lib for the model` WASM library URL not provided Ensure `model_lib` is set in `ModelRecord` pointing to a valid `.wasm` file
`WARNING: the current maxStorageBufferBindingSize...` GPU buffer limit < 1 GB (mobile devices) Use models with `-1k` suffix (e.g., `Llama-3.1-8B-Instruct-q4f16_1-MLC-1k`)

Compatibility Notes

  • Desktop Chrome/Edge 113+: Full WebGPU support including shader-f16 on most GPUs.
  • Mobile Chrome (Android): WebGPU available but `maxStorageBufferBindingSize` often limited to 128-256 MB, restricting model selection to `-1k` variants.
  • Safari: WebGPU support is experimental and may not include all required features.
  • Firefox: WebGPU support in nightly builds only as of early 2026.
  • Service Workers: Require HTTPS context; not available on `file://` or `http://localhost` without flags.
  • Cache API vs IndexedDB: Cache API is the default and more well-tested in WebLLM; IndexedDB is available via `useIndexedDBCache: true` but less stable.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment