Implementation:Mlc ai Web llm Embedding Model Config

Overview

Embedding_Model_Config documents the concrete data structures and prebuilt configuration entries in web-llm that define embedding models. This covers the ModelRecord interface, the ModelType enum, the AppConfig interface, and the specific embedding model entries registered in prebuiltAppConfig.

Code Reference

ModelType Enum

Defined in src/config.ts at lines 232-236:

export enum ModelType {
  "LLM",
  "embedding",
  "VLM", // vision-language model
}

The numeric values are: ModelType.LLM = 0, ModelType.embedding = 1, ModelType.VLM = 2. When model_type is omitted from a ModelRecord, the engine defaults to ModelType.LLM.

ModelRecord Interface

Defined in src/config.ts at lines 255-265:

export interface ModelRecord {
  model: string;          // HuggingFace URL for model weights
  model_id: string;       // Unique identifier used for loading and API calls
  model_lib: string;      // URL to the compiled WebGPU WASM library
  overrides?: ChatOptions;       // Optional config overrides
  vram_required_MB?: number;     // VRAM requirement in megabytes
  low_resource_required?: boolean;  // Whether it runs on limited devices
  buffer_size_required_bytes?: number;  // Required maxStorageBufferBindingSize
  required_features?: Array<string>;    // GPU features needed (e.g. "shader-f16")
  model_type?: ModelType;        // Model category: LLM, embedding, or VLM
}

AppConfig Interface

Defined in src/config.ts at lines 278-281:

export interface AppConfig {
  model_list: Array<ModelRecord>;
  useIndexedDBCache?: boolean;
}

Prebuilt Embedding Model Entries

Defined in src/config.ts at lines 2241-2282, the prebuilt embedding models are:

model_id	Base Model	Batch Size	VRAM (MB)	Context Window	WASM Library
snowflake-arctic-embed-m-q0f32-MLC-b32	snowflake-arctic-embed-m	32	1407.51	512	snowflake-arctic-embed-m-q0f32-ctx512_cs512_batch32-webgpu.wasm
snowflake-arctic-embed-m-q0f32-MLC-b4	snowflake-arctic-embed-m	4	539.40	512	snowflake-arctic-embed-m-q0f32-ctx512_cs512_batch4-webgpu.wasm
snowflake-arctic-embed-s-q0f32-MLC-b32	snowflake-arctic-embed-s	32	1022.82	512	snowflake-arctic-embed-s-q0f32-ctx512_cs512_batch32-webgpu.wasm
snowflake-arctic-embed-s-q0f32-MLC-b4	snowflake-arctic-embed-s	4	238.71	512	snowflake-arctic-embed-s-q0f32-ctx512_cs512_batch4-webgpu.wasm

Naming convention for model_id:

snowflake-arctic-embed -- model family
-m or -s -- model size (medium or small)
-q0f32 -- quantization (q0 = unquantized, f32 = float32 weights)
-MLC -- compiled with the MLC framework
-b4 or -b32 -- max batch size

Example entry from source:

{
  model: "https://huggingface.co/mlc-ai/snowflake-arctic-embed-m-q0f32-MLC",
  model_id: "snowflake-arctic-embed-m-q0f32-MLC-b4",
  model_lib:
    modelLibURLPrefix +
    modelVersion +
    "/snowflake-arctic-embed-m-q0f32-ctx512_cs512_batch4-webgpu.wasm",
  vram_required_MB: 539.4,
  model_type: ModelType.embedding,
},

Note that embedding models do not specify overrides, low_resource_required, or required_features. They also do not need shader-f16 since they use f32 precision.

I/O Contract

Import:

import {
  prebuiltAppConfig,
  ModelType,
  ModelRecord,
  AppConfig,
} from "@mlc-ai/web-llm";

Filtering for embedding models:

// Returns: Array<ModelRecord> where each entry has model_type === ModelType.embedding
const embeddingModels: ModelRecord[] = prebuiltAppConfig.model_list.filter(
  (record) => record.model_type === ModelType.embedding,
);

Usage Examples

import {
  CreateMLCEngine,
  prebuiltAppConfig,
  ModelType,
  ModelRecord,
} from "@mlc-ai/web-llm";

// List all embedding models with their properties
const embeddingModels: ModelRecord[] = prebuiltAppConfig.model_list.filter(
  (m) => m.model_type === ModelType.embedding,
);
for (const m of embeddingModels) {
  console.log(`Model: ${m.model_id}`);
  console.log(`  VRAM: ${m.vram_required_MB} MB`);
  console.log(`  Weights: ${m.model}`);
  console.log(`  WASM: ${m.model_lib}`);
}

// Load the small batch-4 model (lowest memory footprint)
const engine = await CreateMLCEngine("snowflake-arctic-embed-s-q0f32-MLC-b4");

import { CreateMLCEngine, AppConfig, ModelType } from "@mlc-ai/web-llm";

// Register a custom embedding model via a custom AppConfig
const customAppConfig: AppConfig = {
  model_list: [
    {
      model: "https://huggingface.co/mlc-ai/snowflake-arctic-embed-m-q0f32-MLC",
      model_id: "my-custom-embed-model",
      model_lib:
        "https://raw.githubusercontent.com/mlc-ai/binary-mlc-llm-libs/main/" +
        "web-llm-models/v0_2_80/" +
        "snowflake-arctic-embed-m-q0f32-ctx512_cs512_batch4-webgpu.wasm",
      vram_required_MB: 539.4,
      model_type: ModelType.embedding,
    },
  ],
};

const engine = await CreateMLCEngine("my-custom-embed-model", {
  appConfig: customAppConfig,
});

Related Pages

Principle:Mlc_ai_Web_llm_Embedding_Model_Selection -- Principle:Mlc_ai_Web_llm_Embedding_Model_Selection
Implementation:Mlc_ai_Web_llm_Embeddings_Create -- the API for generating embeddings once a model is loaded
Implementation:Mlc_ai_Web_llm_Multi_Model_RAG_Engine -- loading embedding models alongside LLMs for RAG

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment