Environment:Mlc ai Web llm WebGPU Browser Runtime
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, WebGPU, Browser |
| Last Updated | 2026-02-14 22:00 GMT |
Overview
Browser environment with WebGPU support, Cache/IndexedDB storage, and Web Worker or Service Worker execution context for running MLC-compiled LLM inference in the browser.
Description
This environment defines the runtime requirements for the WebLLM library, which performs hardware-accelerated large language model inference entirely in the browser via WebGPU. The library requires a WebGPU-enabled browser with sufficient GPU VRAM, the Cache API or IndexedDB for model weight caching, and support for Web Workers or Service Workers for off-main-thread execution. Models compiled with f16 quantization additionally require the shader-f16 WebGPU extension.
Usage
Use this environment for any workflow that loads and runs an MLC-compiled language model in the browser. This is the mandatory prerequisite for all WebLLM implementations including Create_MLC_Engine, Chat_Completions_Create, Embeddings_Create, and all worker-based engine variants.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Browser | WebGPU-enabled browser | Chrome 113+, Edge 113+, or other Chromium-based browsers with WebGPU flag enabled |
| GPU | WebGPU-compatible GPU | VRAM requirements vary by model (879 MB to 6100+ MB) |
| GPU Feature | `shader-f16` extension (for f16 models) | 20+ models require this; launch Chrome Canary with `--enable-dawn-features=allow_unsafe_apis` if missing |
| GPU Buffer | `maxStorageBufferBindingSize` >= 1 GB | Mobile devices with < 1 GB limit are restricted to `-1k` context window model variants |
| Storage | Cache API or IndexedDB | For caching model weights, tokenizer, WASM libraries, and config |
| Context | Web page, Web Worker, or Service Worker | Must have access to `document.URL` or `globalThis.location.origin` |
| Protocol | HTTPS (for Service Workers) | Service Worker API requires secure context |
Dependencies
System Packages
- WebGPU-enabled browser (Chrome 113+, Edge 113+)
- GPU drivers compatible with WebGPU (Vulkan, Metal, or D3D12 backend)
JavaScript Packages
- `@mlc-ai/web-llm` >= 0.2.80
- `@mlc-ai/web-runtime` >= 0.24.0-dev1 (bundled, provides TVM WebGPU runtime)
- `@mlc-ai/web-tokenizers` >= 0.1.6 (bundled, provides tokenizer support)
- `@mlc-ai/web-xgrammar` = 0.1.27 (bundled, provides grammar-constrained decoding)
- `loglevel` >= 1.9.1
Credentials
No API keys or credentials are required for basic WebLLM usage. Model weights are fetched from public HuggingFace repositories:
- `https://huggingface.co/mlc-ai/*` (model weights)
- `https://raw.githubusercontent.com/mlc-ai/binary-mlc-llm-libs/main/` (WASM libraries)
Quick Install
# Install via npm
npm install @mlc-ai/web-llm
<!-- Or via CDN in HTML -->
<script type="module">
import * as webllm from "https://esm.run/@mlc-ai/web-llm";
</script>
Code Evidence
WebGPU availability check from `src/engine.ts:324-328`:
const gpuDetectOutput = await tvmjs.detectGPUDevice();
if (gpuDetectOutput == undefined) {
throw new WebGPUNotAvailableError();
}
WebGPU feature requirement check from `src/engine.ts:335-344`:
if (modelRecord.required_features !== undefined) {
for (const feature of modelRecord.required_features) {
if (!gpuDetectOutput.device.features.has(feature)) {
if (feature == "shader-f16") {
throw new ShaderF16SupportError();
}
throw new FeatureSupportError(feature);
}
}
}
maxStorageBufferBindingSize check from `src/engine.ts:1136-1162`:
const maxStorageBufferBindingSize =
gpuDetectOutput.device.limits.maxStorageBufferBindingSize;
const defaultMaxStorageBufferBindingSize = 1 << 30; // 1GB
if (maxStorageBufferBindingSize < defaultMaxStorageBufferBindingSize) {
log.warn(
`WARNING: the current maxStorageBufferBindingSize ` +
`(${computeMB(maxStorageBufferBindingSize)}) ` +
`may only work for a limited number of models, e.g.: \n` +
`- Llama-3.1-8B-Instruct-q4f16_1-MLC-1k \n` +
`- TinyLlama-1.1B-Chat-v0.4-q4f16_1-MLC-1k`,
);
}
Service Worker API check from `src/service_worker.ts:222-224`:
if (!("serviceWorker" in navigator)) {
throw new NoServiceWorkerAPIError();
}
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `WebGPU is not supported in your current environment` | Browser does not have WebGPU enabled | Use Chrome 113+ or enable WebGPU flag in browser settings. Visit https://webgpureport.org/ to verify |
| `This model requires WebGPU extension shader-f16` | Browser lacks shader-f16 support | Launch Chrome Canary with `--enable-dawn-features=allow_unsafe_apis`, or choose a q4f32 model variant instead |
| `Device was lost...insufficient memory or other GPU constraints` | GPU ran out of VRAM during model loading | Reload with a smaller model or reduce `context_window_size` via `ModelRecord.overrides` |
| `Service worker API is not available` | Not in HTTPS context or browser lacks SW support | Ensure page is served over HTTPS; check browser Service Worker support |
| `Missing model_lib for the model` | WASM library URL not provided | Ensure `model_lib` is set in `ModelRecord` pointing to a valid `.wasm` file |
| `WARNING: the current maxStorageBufferBindingSize...` | GPU buffer limit < 1 GB (mobile devices) | Use models with `-1k` suffix (e.g., `Llama-3.1-8B-Instruct-q4f16_1-MLC-1k`) |
Compatibility Notes
- Desktop Chrome/Edge 113+: Full WebGPU support including shader-f16 on most GPUs.
- Mobile Chrome (Android): WebGPU available but `maxStorageBufferBindingSize` often limited to 128-256 MB, restricting model selection to `-1k` variants.
- Safari: WebGPU support is experimental and may not include all required features.
- Firefox: WebGPU support in nightly builds only as of early 2026.
- Service Workers: Require HTTPS context; not available on `file://` or `http://localhost` without flags.
- Cache API vs IndexedDB: Cache API is the default and more well-tested in WebLLM; IndexedDB is available via `useIndexedDBCache: true` but less stable.
Related Pages
- Implementation:Mlc_ai_Web_llm_Create_MLC_Engine
- Implementation:Mlc_ai_Web_llm_Chat_Completions_Create
- Implementation:Mlc_ai_Web_llm_Embeddings_Create
- Implementation:Mlc_ai_Web_llm_Web_Worker_MLC_Engine_Handler
- Implementation:Mlc_ai_Web_llm_Create_Web_Worker_MLC_Engine
- Implementation:Mlc_ai_Web_llm_Grammar_Matcher_Decoding
- Implementation:Mlc_ai_Web_llm_Prebuilt_App_Config
- Implementation:Mlc_ai_Web_llm_Multi_Model_RAG_Engine