Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Mlc ai Web llm Embedding Model Config

From Leeroopedia

Template:Metadata

Overview

Embedding_Model_Config documents the concrete data structures and prebuilt configuration entries in web-llm that define embedding models. This covers the ModelRecord interface, the ModelType enum, the AppConfig interface, and the specific embedding model entries registered in prebuiltAppConfig.

Code Reference

ModelType Enum

Defined in src/config.ts at lines 232-236:

export enum ModelType {
  "LLM",
  "embedding",
  "VLM", // vision-language model
}

The numeric values are: ModelType.LLM = 0, ModelType.embedding = 1, ModelType.VLM = 2. When model_type is omitted from a ModelRecord, the engine defaults to ModelType.LLM.

ModelRecord Interface

Defined in src/config.ts at lines 255-265:

export interface ModelRecord {
  model: string;          // HuggingFace URL for model weights
  model_id: string;       // Unique identifier used for loading and API calls
  model_lib: string;      // URL to the compiled WebGPU WASM library
  overrides?: ChatOptions;       // Optional config overrides
  vram_required_MB?: number;     // VRAM requirement in megabytes
  low_resource_required?: boolean;  // Whether it runs on limited devices
  buffer_size_required_bytes?: number;  // Required maxStorageBufferBindingSize
  required_features?: Array<string>;    // GPU features needed (e.g. "shader-f16")
  model_type?: ModelType;        // Model category: LLM, embedding, or VLM
}

AppConfig Interface

Defined in src/config.ts at lines 278-281:

export interface AppConfig {
  model_list: Array<ModelRecord>;
  useIndexedDBCache?: boolean;
}

Prebuilt Embedding Model Entries

Defined in src/config.ts at lines 2241-2282, the prebuilt embedding models are:

model_id Base Model Batch Size VRAM (MB) Context Window WASM Library
snowflake-arctic-embed-m-q0f32-MLC-b32 snowflake-arctic-embed-m 32 1407.51 512 snowflake-arctic-embed-m-q0f32-ctx512_cs512_batch32-webgpu.wasm
snowflake-arctic-embed-m-q0f32-MLC-b4 snowflake-arctic-embed-m 4 539.40 512 snowflake-arctic-embed-m-q0f32-ctx512_cs512_batch4-webgpu.wasm
snowflake-arctic-embed-s-q0f32-MLC-b32 snowflake-arctic-embed-s 32 1022.82 512 snowflake-arctic-embed-s-q0f32-ctx512_cs512_batch32-webgpu.wasm
snowflake-arctic-embed-s-q0f32-MLC-b4 snowflake-arctic-embed-s 4 238.71 512 snowflake-arctic-embed-s-q0f32-ctx512_cs512_batch4-webgpu.wasm

Naming convention for model_id:

  • snowflake-arctic-embed -- model family
  • -m or -s -- model size (medium or small)
  • -q0f32 -- quantization (q0 = unquantized, f32 = float32 weights)
  • -MLC -- compiled with the MLC framework
  • -b4 or -b32 -- max batch size

Example entry from source:

{
  model: "https://huggingface.co/mlc-ai/snowflake-arctic-embed-m-q0f32-MLC",
  model_id: "snowflake-arctic-embed-m-q0f32-MLC-b4",
  model_lib:
    modelLibURLPrefix +
    modelVersion +
    "/snowflake-arctic-embed-m-q0f32-ctx512_cs512_batch4-webgpu.wasm",
  vram_required_MB: 539.4,
  model_type: ModelType.embedding,
},

Note that embedding models do not specify overrides, low_resource_required, or required_features. They also do not need shader-f16 since they use f32 precision.

I/O Contract

Import:

import {
  prebuiltAppConfig,
  ModelType,
  ModelRecord,
  AppConfig,
} from "@mlc-ai/web-llm";

Filtering for embedding models:

// Returns: Array<ModelRecord> where each entry has model_type === ModelType.embedding
const embeddingModels: ModelRecord[] = prebuiltAppConfig.model_list.filter(
  (record) => record.model_type === ModelType.embedding,
);

Usage Examples

import {
  CreateMLCEngine,
  prebuiltAppConfig,
  ModelType,
  ModelRecord,
} from "@mlc-ai/web-llm";

// List all embedding models with their properties
const embeddingModels: ModelRecord[] = prebuiltAppConfig.model_list.filter(
  (m) => m.model_type === ModelType.embedding,
);
for (const m of embeddingModels) {
  console.log(`Model: ${m.model_id}`);
  console.log(`  VRAM: ${m.vram_required_MB} MB`);
  console.log(`  Weights: ${m.model}`);
  console.log(`  WASM: ${m.model_lib}`);
}

// Load the small batch-4 model (lowest memory footprint)
const engine = await CreateMLCEngine("snowflake-arctic-embed-s-q0f32-MLC-b4");
import { CreateMLCEngine, AppConfig, ModelType } from "@mlc-ai/web-llm";

// Register a custom embedding model via a custom AppConfig
const customAppConfig: AppConfig = {
  model_list: [
    {
      model: "https://huggingface.co/mlc-ai/snowflake-arctic-embed-m-q0f32-MLC",
      model_id: "my-custom-embed-model",
      model_lib:
        "https://raw.githubusercontent.com/mlc-ai/binary-mlc-llm-libs/main/" +
        "web-llm-models/v0_2_80/" +
        "snowflake-arctic-embed-m-q0f32-ctx512_cs512_batch4-webgpu.wasm",
      vram_required_MB: 539.4,
      model_type: ModelType.embedding,
    },
  ],
};

const engine = await CreateMLCEngine("my-custom-embed-model", {
  appConfig: customAppConfig,
});

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment