Implementation:Mlc ai Web llm Create MLC Engine
Appearance
Overview
CreateMLCEngine is the async factory function provided by @mlc-ai/web-llm that creates an MLCEngine instance, calls reload() to download and initialize the specified model(s) into WebGPU memory, and returns the fully ready engine. The MLCEngine class implements MLCEngineInterface and provides an OpenAI-compatible API surface including chat.completions, completions, and embeddings.
Description
CreateMLCEngine is a thin factory wrapper that:
- Constructs a new
MLCEngineinstance with the provided engine configuration - Calls
engine.reload(modelId, chatOpts)to perform the full model loading pipeline - Returns the initialized engine
The MLCEngine constructor sets up:
- API proxy objects --
engine.chat(containingcompletions),engine.completions, andengine.embeddings - State maps --
loadedModelIdToPipeline,loadedModelIdToChatConfig,loadedModelIdToModelType, andloadedModelIdToLock - Configuration -- App config (defaults to
prebuiltAppConfig), log level, progress callback, and logit processor registry
The reload() method performs the heavy lifting:
- Unloads all previously loaded models via
unload() - Converts single model inputs to arrays (supports loading multiple models)
- Validates that all model IDs are unique
- Sequentially loads each model via
reloadInternal(), which handles WASM download, TVM initialization, WebGPU setup, tokenizer loading, weight transfer, and pipeline creation
Code Reference
- Repository: https://github.com/mlc-ai/web-llm
- File:
src/engine.ts - Factory function: Lines 90-98
- MLCEngine constructor: Lines 141-157
- reload(): Lines 194-237
- reloadInternal(): Lines 239-410
Type Signature
export async function CreateMLCEngine(
modelId: string | string[],
engineConfig?: MLCEngineConfig,
chatOpts?: ChatOptions | ChatOptions[],
): Promise<MLCEngine>
export interface MLCEngineConfig {
appConfig?: AppConfig;
initProgressCallback?: InitProgressCallback;
logitProcessorRegistry?: Map<string, LogitProcessor>;
logLevel?: LogLevel;
}
Import
import { CreateMLCEngine, MLCEngine, MLCEngineConfig } from "@mlc-ai/web-llm";
I/O Contract
| Direction | Name | Type | Required | Description |
|---|---|---|---|---|
| Input | modelId | string[] | Yes | Model ID(s) to load; must exist in prebuiltAppConfig or engineConfig.appConfig
|
| Input | engineConfig | MLCEngineConfig |
No | Optional config for app settings, progress callback, logit processors, and log level |
| Input | chatOpts | ChatOptions[] | No | Optional overrides for mlc-chat-config.json; array size must match modelId array
|
| Output | engine | Promise<MLCEngine> |
-- | Fully initialized engine ready for inference calls |
Error conditions:
- Throws
WebGPUNotAvailableErrorif the browser does not support WebGPU - Throws
ShaderF16SupportErrorif the model requiresshader-f16but the device lacks it - Throws
DeviceLostErrorif GPU memory is exhausted during loading - Throws
MissingModelWasmErrorifmodel_libis undefined - Throws
ReloadModelIdNotUniqueErrorif duplicate model IDs are provided - Throws
ReloadArgumentSizeUnmatchedErrorifchatOptsarray length does not matchmodelIdarray length
Usage Example
import { CreateMLCEngine } from "@mlc-ai/web-llm";
// Basic engine creation with progress reporting
const engine = await CreateMLCEngine("Llama-3.2-1B-Instruct-q4f16_1-MLC", {
initProgressCallback: (progress) => {
console.log(`Loading: ${(progress.progress * 100).toFixed(1)}% - ${progress.text}`);
},
});
// Engine creation with custom context window override
const smallEngine = await CreateMLCEngine(
"Phi-3.5-mini-instruct-q4f16_1-MLC",
{
initProgressCallback: (progress) => {
document.getElementById("status").textContent = progress.text;
},
},
{
context_window_size: 2048, // Override to use smaller context window
},
);
// Loading multiple models into a single engine
const multiEngine = await CreateMLCEngine(
["Llama-3.2-1B-Instruct-q4f16_1-MLC", "snowflake-arctic-embed-m-q0f32-MLC-b4"],
{
initProgressCallback: (progress) => {
console.log(progress.text);
},
},
);
// multiEngine can now serve both chat and embedding requests
// Alternative: manual construction and reload
const engine2 = new MLCEngine({
initProgressCallback: (progress) => console.log(progress.text),
});
await engine2.reload("Qwen2.5-1.5B-Instruct-q4f16_1-MLC");
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment