Environment:Mlc ai Web llm WebGPU Browser Runtime

Knowledge Sources	mlc-ai/web-llm WebGPU Report MDN WebGPU
Domains	Infrastructure, WebGPU, Browser
Last Updated	2026-02-14 22:00 GMT

Overview

Browser environment with WebGPU support, Cache/IndexedDB storage, and Web Worker or Service Worker execution context for running MLC-compiled LLM inference in the browser.

Description

This environment defines the runtime requirements for the WebLLM library, which performs hardware-accelerated large language model inference entirely in the browser via WebGPU. The library requires a WebGPU-enabled browser with sufficient GPU VRAM, the Cache API or IndexedDB for model weight caching, and support for Web Workers or Service Workers for off-main-thread execution. Models compiled with f16 quantization additionally require the shader-f16 WebGPU extension.

Usage

Use this environment for any workflow that loads and runs an MLC-compiled language model in the browser. This is the mandatory prerequisite for all WebLLM implementations including Create_MLC_Engine, Chat_Completions_Create, Embeddings_Create, and all worker-based engine variants.

System Requirements

Category	Requirement	Notes
Browser	WebGPU-enabled browser	Chrome 113+, Edge 113+, or other Chromium-based browsers with WebGPU flag enabled
GPU	WebGPU-compatible GPU	VRAM requirements vary by model (879 MB to 6100+ MB)
GPU Feature	`shader-f16` extension (for f16 models)	20+ models require this; launch Chrome Canary with `--enable-dawn-features=allow_unsafe_apis` if missing
GPU Buffer	`maxStorageBufferBindingSize` >= 1 GB	Mobile devices with < 1 GB limit are restricted to `-1k` context window model variants
Storage	Cache API or IndexedDB	For caching model weights, tokenizer, WASM libraries, and config
Context	Web page, Web Worker, or Service Worker	Must have access to `document.URL` or `globalThis.location.origin`
Protocol	HTTPS (for Service Workers)	Service Worker API requires secure context

Dependencies

System Packages

WebGPU-enabled browser (Chrome 113+, Edge 113+)
GPU drivers compatible with WebGPU (Vulkan, Metal, or D3D12 backend)

JavaScript Packages

`@mlc-ai/web-llm` >= 0.2.80
`@mlc-ai/web-runtime` >= 0.24.0-dev1 (bundled, provides TVM WebGPU runtime)
`@mlc-ai/web-tokenizers` >= 0.1.6 (bundled, provides tokenizer support)
`@mlc-ai/web-xgrammar` = 0.1.27 (bundled, provides grammar-constrained decoding)
`loglevel` >= 1.9.1

Credentials

No API keys or credentials are required for basic WebLLM usage. Model weights are fetched from public HuggingFace repositories:

`https://huggingface.co/mlc-ai/*` (model weights)
`https://raw.githubusercontent.com/mlc-ai/binary-mlc-llm-libs/main/` (WASM libraries)

Quick Install

# Install via npm
npm install @mlc-ai/web-llm

<!-- Or via CDN in HTML -->
<script type="module">
  import * as webllm from "https://esm.run/@mlc-ai/web-llm";
</script>

Code Evidence

WebGPU availability check from `src/engine.ts:324-328`:

const gpuDetectOutput = await tvmjs.detectGPUDevice();
if (gpuDetectOutput == undefined) {
  throw new WebGPUNotAvailableError();
}

WebGPU feature requirement check from `src/engine.ts:335-344`:

if (modelRecord.required_features !== undefined) {
  for (const feature of modelRecord.required_features) {
    if (!gpuDetectOutput.device.features.has(feature)) {
      if (feature == "shader-f16") {
        throw new ShaderF16SupportError();
      }
      throw new FeatureSupportError(feature);
    }
  }
}

maxStorageBufferBindingSize check from `src/engine.ts:1136-1162`:

const maxStorageBufferBindingSize =
  gpuDetectOutput.device.limits.maxStorageBufferBindingSize;
const defaultMaxStorageBufferBindingSize = 1 << 30; // 1GB
if (maxStorageBufferBindingSize < defaultMaxStorageBufferBindingSize) {
  log.warn(
    `WARNING: the current maxStorageBufferBindingSize ` +
      `(${computeMB(maxStorageBufferBindingSize)}) ` +
      `may only work for a limited number of models, e.g.: \n` +
      `- Llama-3.1-8B-Instruct-q4f16_1-MLC-1k \n` +
      `- TinyLlama-1.1B-Chat-v0.4-q4f16_1-MLC-1k`,
  );
}

Service Worker API check from `src/service_worker.ts:222-224`:

if (!("serviceWorker" in navigator)) {
  throw new NoServiceWorkerAPIError();
}

Common Errors

Error Message	Cause	Solution
`WebGPU is not supported in your current environment`	Browser does not have WebGPU enabled	Use Chrome 113+ or enable WebGPU flag in browser settings. Visit https://webgpureport.org/ to verify
`This model requires WebGPU extension shader-f16`	Browser lacks shader-f16 support	Launch Chrome Canary with `--enable-dawn-features=allow_unsafe_apis`, or choose a q4f32 model variant instead
`Device was lost...insufficient memory or other GPU constraints`	GPU ran out of VRAM during model loading	Reload with a smaller model or reduce `context_window_size` via `ModelRecord.overrides`
`Service worker API is not available`	Not in HTTPS context or browser lacks SW support	Ensure page is served over HTTPS; check browser Service Worker support
`Missing model_lib for the model`	WASM library URL not provided	Ensure `model_lib` is set in `ModelRecord` pointing to a valid `.wasm` file
`WARNING: the current maxStorageBufferBindingSize...`	GPU buffer limit < 1 GB (mobile devices)	Use models with `-1k` suffix (e.g., `Llama-3.1-8B-Instruct-q4f16_1-MLC-1k`)

Compatibility Notes

Desktop Chrome/Edge 113+: Full WebGPU support including shader-f16 on most GPUs.
Mobile Chrome (Android): WebGPU available but `maxStorageBufferBindingSize` often limited to 128-256 MB, restricting model selection to `-1k` variants.
Safari: WebGPU support is experimental and may not include all required features.
Firefox: WebGPU support in nightly builds only as of early 2026.
Service Workers: Require HTTPS context; not available on `file://` or `http://localhost` without flags.
Cache API vs IndexedDB: Cache API is the default and more well-tested in WebLLM; IndexedDB is available via `useIndexedDBCache: true` but less stable.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment