Implementation:Mlc ai Web llm Create MLC Engine

Overview

CreateMLCEngine is the async factory function provided by @mlc-ai/web-llm that creates an MLCEngine instance, calls reload() to download and initialize the specified model(s) into WebGPU memory, and returns the fully ready engine. The MLCEngine class implements MLCEngineInterface and provides an OpenAI-compatible API surface including chat.completions, completions, and embeddings.

Description

CreateMLCEngine is a thin factory wrapper that:

Constructs a new MLCEngine instance with the provided engine configuration
Calls engine.reload(modelId, chatOpts) to perform the full model loading pipeline
Returns the initialized engine

The MLCEngine constructor sets up:

API proxy objects -- engine.chat (containing completions), engine.completions, and engine.embeddings
State maps -- loadedModelIdToPipeline, loadedModelIdToChatConfig, loadedModelIdToModelType, and loadedModelIdToLock
Configuration -- App config (defaults to prebuiltAppConfig), log level, progress callback, and logit processor registry

The reload() method performs the heavy lifting:

Unloads all previously loaded models via unload()
Converts single model inputs to arrays (supports loading multiple models)
Validates that all model IDs are unique
Sequentially loads each model via reloadInternal(), which handles WASM download, TVM initialization, WebGPU setup, tokenizer loading, weight transfer, and pipeline creation

Code Reference

Repository: https://github.com/mlc-ai/web-llm
File: src/engine.ts
Factory function: Lines 90-98
MLCEngine constructor: Lines 141-157
reload(): Lines 194-237
reloadInternal(): Lines 239-410

Type Signature

export async function CreateMLCEngine(
  modelId: string | string[],
  engineConfig?: MLCEngineConfig,
  chatOpts?: ChatOptions | ChatOptions[],
): Promise<MLCEngine>

export interface MLCEngineConfig {
  appConfig?: AppConfig;
  initProgressCallback?: InitProgressCallback;
  logitProcessorRegistry?: Map<string, LogitProcessor>;
  logLevel?: LogLevel;
}

Import

import { CreateMLCEngine, MLCEngine, MLCEngineConfig } from "@mlc-ai/web-llm";

I/O Contract

Direction	Name	Type	Required	Description
Input	modelId	string[]	Yes	Model ID(s) to load; must exist in `prebuiltAppConfig` or `engineConfig.appConfig`
Input	engineConfig	`MLCEngineConfig`	No	Optional config for app settings, progress callback, logit processors, and log level
Input	chatOpts	ChatOptions[]	No	Optional overrides for `mlc-chat-config.json`; array size must match modelId array
Output	engine	`Promise<MLCEngine>`	--	Fully initialized engine ready for inference calls

Error conditions:

Throws WebGPUNotAvailableError if the browser does not support WebGPU
Throws ShaderF16SupportError if the model requires shader-f16 but the device lacks it
Throws DeviceLostError if GPU memory is exhausted during loading
Throws MissingModelWasmError if model_lib is undefined
Throws ReloadModelIdNotUniqueError if duplicate model IDs are provided
Throws ReloadArgumentSizeUnmatchedError if chatOpts array length does not match modelId array length

Usage Example

import { CreateMLCEngine } from "@mlc-ai/web-llm";

// Basic engine creation with progress reporting
const engine = await CreateMLCEngine("Llama-3.2-1B-Instruct-q4f16_1-MLC", {
  initProgressCallback: (progress) => {
    console.log(`Loading: ${(progress.progress * 100).toFixed(1)}% - ${progress.text}`);
  },
});

// Engine creation with custom context window override
const smallEngine = await CreateMLCEngine(
  "Phi-3.5-mini-instruct-q4f16_1-MLC",
  {
    initProgressCallback: (progress) => {
      document.getElementById("status").textContent = progress.text;
    },
  },
  {
    context_window_size: 2048,  // Override to use smaller context window
  },
);

// Loading multiple models into a single engine
const multiEngine = await CreateMLCEngine(
  ["Llama-3.2-1B-Instruct-q4f16_1-MLC", "snowflake-arctic-embed-m-q0f32-MLC-b4"],
  {
    initProgressCallback: (progress) => {
      console.log(progress.text);
    },
  },
);
// multiEngine can now serve both chat and embedding requests

// Alternative: manual construction and reload
const engine2 = new MLCEngine({
  initProgressCallback: (progress) => console.log(progress.text),
});
await engine2.reload("Qwen2.5-1.5B-Instruct-q4f16_1-MLC");

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment