Implementation:Mlc ai Web llm Embedding Model Config
Overview
Embedding_Model_Config documents the concrete data structures and prebuilt configuration entries in web-llm that define embedding models. This covers the ModelRecord interface, the ModelType enum, the AppConfig interface, and the specific embedding model entries registered in prebuiltAppConfig.
Code Reference
ModelType Enum
Defined in src/config.ts at lines 232-236:
export enum ModelType {
"LLM",
"embedding",
"VLM", // vision-language model
}
The numeric values are: ModelType.LLM = 0, ModelType.embedding = 1, ModelType.VLM = 2. When model_type is omitted from a ModelRecord, the engine defaults to ModelType.LLM.
ModelRecord Interface
Defined in src/config.ts at lines 255-265:
export interface ModelRecord {
model: string; // HuggingFace URL for model weights
model_id: string; // Unique identifier used for loading and API calls
model_lib: string; // URL to the compiled WebGPU WASM library
overrides?: ChatOptions; // Optional config overrides
vram_required_MB?: number; // VRAM requirement in megabytes
low_resource_required?: boolean; // Whether it runs on limited devices
buffer_size_required_bytes?: number; // Required maxStorageBufferBindingSize
required_features?: Array<string>; // GPU features needed (e.g. "shader-f16")
model_type?: ModelType; // Model category: LLM, embedding, or VLM
}
AppConfig Interface
Defined in src/config.ts at lines 278-281:
export interface AppConfig {
model_list: Array<ModelRecord>;
useIndexedDBCache?: boolean;
}
Prebuilt Embedding Model Entries
Defined in src/config.ts at lines 2241-2282, the prebuilt embedding models are:
| model_id | Base Model | Batch Size | VRAM (MB) | Context Window | WASM Library |
|---|---|---|---|---|---|
| snowflake-arctic-embed-m-q0f32-MLC-b32 | snowflake-arctic-embed-m | 32 | 1407.51 | 512 | snowflake-arctic-embed-m-q0f32-ctx512_cs512_batch32-webgpu.wasm |
| snowflake-arctic-embed-m-q0f32-MLC-b4 | snowflake-arctic-embed-m | 4 | 539.40 | 512 | snowflake-arctic-embed-m-q0f32-ctx512_cs512_batch4-webgpu.wasm |
| snowflake-arctic-embed-s-q0f32-MLC-b32 | snowflake-arctic-embed-s | 32 | 1022.82 | 512 | snowflake-arctic-embed-s-q0f32-ctx512_cs512_batch32-webgpu.wasm |
| snowflake-arctic-embed-s-q0f32-MLC-b4 | snowflake-arctic-embed-s | 4 | 238.71 | 512 | snowflake-arctic-embed-s-q0f32-ctx512_cs512_batch4-webgpu.wasm |
Naming convention for model_id:
snowflake-arctic-embed-- model family-mor-s-- model size (medium or small)-q0f32-- quantization (q0 = unquantized, f32 = float32 weights)-MLC-- compiled with the MLC framework-b4or-b32-- max batch size
Example entry from source:
{
model: "https://huggingface.co/mlc-ai/snowflake-arctic-embed-m-q0f32-MLC",
model_id: "snowflake-arctic-embed-m-q0f32-MLC-b4",
model_lib:
modelLibURLPrefix +
modelVersion +
"/snowflake-arctic-embed-m-q0f32-ctx512_cs512_batch4-webgpu.wasm",
vram_required_MB: 539.4,
model_type: ModelType.embedding,
},
Note that embedding models do not specify overrides, low_resource_required, or required_features. They also do not need shader-f16 since they use f32 precision.
I/O Contract
Import:
import {
prebuiltAppConfig,
ModelType,
ModelRecord,
AppConfig,
} from "@mlc-ai/web-llm";
Filtering for embedding models:
// Returns: Array<ModelRecord> where each entry has model_type === ModelType.embedding
const embeddingModels: ModelRecord[] = prebuiltAppConfig.model_list.filter(
(record) => record.model_type === ModelType.embedding,
);
Usage Examples
import {
CreateMLCEngine,
prebuiltAppConfig,
ModelType,
ModelRecord,
} from "@mlc-ai/web-llm";
// List all embedding models with their properties
const embeddingModels: ModelRecord[] = prebuiltAppConfig.model_list.filter(
(m) => m.model_type === ModelType.embedding,
);
for (const m of embeddingModels) {
console.log(`Model: ${m.model_id}`);
console.log(` VRAM: ${m.vram_required_MB} MB`);
console.log(` Weights: ${m.model}`);
console.log(` WASM: ${m.model_lib}`);
}
// Load the small batch-4 model (lowest memory footprint)
const engine = await CreateMLCEngine("snowflake-arctic-embed-s-q0f32-MLC-b4");
import { CreateMLCEngine, AppConfig, ModelType } from "@mlc-ai/web-llm";
// Register a custom embedding model via a custom AppConfig
const customAppConfig: AppConfig = {
model_list: [
{
model: "https://huggingface.co/mlc-ai/snowflake-arctic-embed-m-q0f32-MLC",
model_id: "my-custom-embed-model",
model_lib:
"https://raw.githubusercontent.com/mlc-ai/binary-mlc-llm-libs/main/" +
"web-llm-models/v0_2_80/" +
"snowflake-arctic-embed-m-q0f32-ctx512_cs512_batch4-webgpu.wasm",
vram_required_MB: 539.4,
model_type: ModelType.embedding,
},
],
};
const engine = await CreateMLCEngine("my-custom-embed-model", {
appConfig: customAppConfig,
});
Related Pages
- Principle:Mlc_ai_Web_llm_Embedding_Model_Selection -- Principle:Mlc_ai_Web_llm_Embedding_Model_Selection
- Implementation:Mlc_ai_Web_llm_Embeddings_Create -- the API for generating embeddings once a model is loaded
- Implementation:Mlc_ai_Web_llm_Multi_Model_RAG_Engine -- loading embedding models alongside LLMs for RAG