Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Mlc ai Web llm Create MLC Engine

From Leeroopedia

Overview

CreateMLCEngine is the async factory function provided by @mlc-ai/web-llm that creates an MLCEngine instance, calls reload() to download and initialize the specified model(s) into WebGPU memory, and returns the fully ready engine. The MLCEngine class implements MLCEngineInterface and provides an OpenAI-compatible API surface including chat.completions, completions, and embeddings.

Description

CreateMLCEngine is a thin factory wrapper that:

  1. Constructs a new MLCEngine instance with the provided engine configuration
  2. Calls engine.reload(modelId, chatOpts) to perform the full model loading pipeline
  3. Returns the initialized engine

The MLCEngine constructor sets up:

  • API proxy objects -- engine.chat (containing completions), engine.completions, and engine.embeddings
  • State maps -- loadedModelIdToPipeline, loadedModelIdToChatConfig, loadedModelIdToModelType, and loadedModelIdToLock
  • Configuration -- App config (defaults to prebuiltAppConfig), log level, progress callback, and logit processor registry

The reload() method performs the heavy lifting:

  1. Unloads all previously loaded models via unload()
  2. Converts single model inputs to arrays (supports loading multiple models)
  3. Validates that all model IDs are unique
  4. Sequentially loads each model via reloadInternal(), which handles WASM download, TVM initialization, WebGPU setup, tokenizer loading, weight transfer, and pipeline creation

Code Reference

  • Repository: https://github.com/mlc-ai/web-llm
  • File: src/engine.ts
  • Factory function: Lines 90-98
  • MLCEngine constructor: Lines 141-157
  • reload(): Lines 194-237
  • reloadInternal(): Lines 239-410

Type Signature

export async function CreateMLCEngine(
  modelId: string | string[],
  engineConfig?: MLCEngineConfig,
  chatOpts?: ChatOptions | ChatOptions[],
): Promise<MLCEngine>
export interface MLCEngineConfig {
  appConfig?: AppConfig;
  initProgressCallback?: InitProgressCallback;
  logitProcessorRegistry?: Map<string, LogitProcessor>;
  logLevel?: LogLevel;
}

Import

import { CreateMLCEngine, MLCEngine, MLCEngineConfig } from "@mlc-ai/web-llm";

I/O Contract

Direction Name Type Required Description
Input modelId string[] Yes Model ID(s) to load; must exist in prebuiltAppConfig or engineConfig.appConfig
Input engineConfig MLCEngineConfig No Optional config for app settings, progress callback, logit processors, and log level
Input chatOpts ChatOptions[] No Optional overrides for mlc-chat-config.json; array size must match modelId array
Output engine Promise<MLCEngine> -- Fully initialized engine ready for inference calls

Error conditions:

  • Throws WebGPUNotAvailableError if the browser does not support WebGPU
  • Throws ShaderF16SupportError if the model requires shader-f16 but the device lacks it
  • Throws DeviceLostError if GPU memory is exhausted during loading
  • Throws MissingModelWasmError if model_lib is undefined
  • Throws ReloadModelIdNotUniqueError if duplicate model IDs are provided
  • Throws ReloadArgumentSizeUnmatchedError if chatOpts array length does not match modelId array length

Usage Example

import { CreateMLCEngine } from "@mlc-ai/web-llm";

// Basic engine creation with progress reporting
const engine = await CreateMLCEngine("Llama-3.2-1B-Instruct-q4f16_1-MLC", {
  initProgressCallback: (progress) => {
    console.log(`Loading: ${(progress.progress * 100).toFixed(1)}% - ${progress.text}`);
  },
});

// Engine creation with custom context window override
const smallEngine = await CreateMLCEngine(
  "Phi-3.5-mini-instruct-q4f16_1-MLC",
  {
    initProgressCallback: (progress) => {
      document.getElementById("status").textContent = progress.text;
    },
  },
  {
    context_window_size: 2048,  // Override to use smaller context window
  },
);

// Loading multiple models into a single engine
const multiEngine = await CreateMLCEngine(
  ["Llama-3.2-1B-Instruct-q4f16_1-MLC", "snowflake-arctic-embed-m-q0f32-MLC-b4"],
  {
    initProgressCallback: (progress) => {
      console.log(progress.text);
    },
  },
);
// multiEngine can now serve both chat and embedding requests

// Alternative: manual construction and reload
const engine2 = new MLCEngine({
  initProgressCallback: (progress) => console.log(progress.text),
});
await engine2.reload("Qwen2.5-1.5B-Instruct-q4f16_1-MLC");

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment