Implementation:Mlc ai Web llm Get Message Model Routing

Overview

Multi-Model Routing Implementation documents the key functions that enable routing requests to the correct model when multiple models are loaded. This is a Pattern Doc covering getModelIdToUse() (the core routing function), getMessage(), and runtimeStatsText() as representative examples of model-routed helper methods that work identically across both direct engine and Web Worker proxy deployments.

Description

The multi-model routing implementation spans two source files:

1. getModelIdToUse() in src/support.ts (L225-254): The core routing function used by all inference APIs. It resolves which loaded model should handle a given request based on the loaded model list and the optional model field in the request.

2. Model-routed proxy methods in src/web_worker.ts (L565-594): The WebWorkerMLCEngine proxy methods getMessage() and runtimeStatsText() that forward model-specific queries to the worker. These are representative of the pattern: they package the optional modelId into a WorkerRequest and delegate resolution to the engine.

The routing function is also used in the proxy's chatCompletion(), completion(), and embedding() methods to determine selectedModelId before sending requests to the worker. This ensures the proxy and worker agree on which model handles each request.

Code Reference

Source 1: src/support.ts, Lines 225-254

/**
 * Return the model to use given the loaded modelIds and requestModel. Throws error when unclear
 * which model to load.
 * @param loadedModelIds Models currently loaded in the engine.
 * @param requestModel Model the user specified to load via the request. Required when multiple
 *   models are loaded
 * @param requestName The type of request or API to load the model for. Needed for error throwing.
 */
export function getModelIdToUse(
  loadedModelIds: string[],
  requestModel: string | undefined | null,
  requestName: string,
): string {
  let selectedModelId: string;
  if (loadedModelIds.length === 0) {
    throw new ModelNotLoadedError(requestName);
  }
  if (requestModel) {
    // If specified model
    if (loadedModelIds.indexOf(requestModel) === -1) {
      throw new SpecifiedModelNotFoundError(
        loadedModelIds,
        requestModel,
        requestName,
      );
    } else {
      selectedModelId = requestModel;
    }
  } else {
    // If not specified
    if (loadedModelIds.length > 1) {
      throw new UnclearModelToUseError(loadedModelIds, requestName);
    } else {
      selectedModelId = loadedModelIds[0];
    }
  }
  return selectedModelId;
}

Source 2: src/web_worker.ts, Lines 565-594 (proxy-side routed methods)

// WebWorkerMLCEngine proxy methods that forward model-specific queries

async getMessage(modelId?: string): Promise<string> {
  const msg: WorkerRequest = {
    kind: "getMessage",
    uuid: crypto.randomUUID(),
    content: {
      modelId: modelId,
    },
  };
  return await this.getPromise<string>(msg);
}

async runtimeStatsText(modelId?: string): Promise<string> {
  const msg: WorkerRequest = {
    kind: "runtimeStatsText",
    uuid: crypto.randomUUID(),
    content: {
      modelId: modelId,
    },
  };
  return await this.getPromise<string>(msg);
}

Worker-side handling (for reference):

// In WebWorkerMLCEngineHandler.onmessage()

case "getMessage": {
  this.handleTask(msg.uuid, async () => {
    const params = msg.content as GetMessageParams;
    const res = await this.engine.getMessage(params.modelId);
    onComplete?.(res);
    return res;
  });
  return;
}

case "runtimeStatsText": {
  this.handleTask(msg.uuid, async () => {
    const params = msg.content as RuntimeStatsTextParams;
    const res = await this.engine.runtimeStatsText(params.modelId);
    onComplete?.(res);
    return res;
  });
  return;
}

I/O Contract

getModelIdToUse()

Inputs:

Parameter	Type	Description
`loadedModelIds`	`string[]`	IDs of all currently loaded models
`requestModel`	undefined \| null	The model specified by the user in the request
`requestName`	`string`	Name of the requesting API (for error messages)

Output: string -- The resolved model ID to use.

Error Conditions:

Error	Condition
`ModelNotLoadedError`	`loadedModelIds` is empty
`SpecifiedModelNotFoundError`	`requestModel` is specified but not found among loaded models
`UnclearModelToUseError`	Multiple models loaded but `requestModel` is not specified

getMessage() / runtimeStatsText()

Input: Optional modelId: string. Required when multiple models are loaded.

Output: Promise<string> -- The current message or runtime statistics text for the specified (or only) model.

Message Parameters:

// GetMessageParams
interface GetMessageParams {
  modelId?: string;
}

// RuntimeStatsTextParams
interface RuntimeStatsTextParams {
  modelId?: string;
}

Import

// getModelIdToUse is an internal utility, not directly imported by users.
// It is used internally by MLCEngine, WebWorkerMLCEngine, and ServiceWorkerMLCEngine.

// Users interact with model routing through the engine API:
import { CreateWebWorkerMLCEngine } from "@mlc-ai/web-llm";

const engine = await CreateWebWorkerMLCEngine(worker, ["model-a", "model-b"]);

// Specify model in requests
const msg = await engine.getMessage("model-a");
const stats = await engine.runtimeStatsText("model-b");

Usage Examples

Querying per-model state with multiple models loaded:

const engine = await CreateWebWorkerMLCEngine(
  worker,
  ["Llama-3.1-8B-Instruct-q4f16_1-MLC", "snowflake-arctic-embed-s-q0f32-MLC"]
);

// Chat with the LLM
await engine.chat.completions.create({
  messages: [{ role: "user", content: "What is 2+2?" }],
  model: "Llama-3.1-8B-Instruct-q4f16_1-MLC",
});

// Get the generated message for the LLM
const message = await engine.getMessage("Llama-3.1-8B-Instruct-q4f16_1-MLC");
console.log("LLM response:", message);

// Get runtime stats for the LLM
const stats = await engine.runtimeStatsText("Llama-3.1-8B-Instruct-q4f16_1-MLC");
console.log("LLM stats:", stats);

Single model (modelId optional):

const engine = await CreateWebWorkerMLCEngine(
  worker,
  "Llama-3.1-8B-Instruct-q4f16_1-MLC"
);

// No need to specify model when only one is loaded
const message = await engine.getMessage();
const stats = await engine.runtimeStatsText();

Resetting chat for a specific model:

const engine = await CreateWebWorkerMLCEngine(
  worker,
  ["model-a", "model-b"]
);

// Reset only model-a's chat state, keep statistics
await engine.resetChat(true, "model-a");

// Reset model-b's chat state and statistics
await engine.resetChat(false, "model-b");

Error handling when model is not specified with multiple models:

const engine = await CreateWebWorkerMLCEngine(
  worker,
  ["model-a", "model-b"]
);

try {
  // This will throw UnclearModelToUseError because multiple models are loaded
  const message = await engine.getMessage();
} catch (e) {
  console.error(e.message);
  // "Multiple models are loaded in engine. Please specify the model in getMessage.
  //  Currently loaded models are: model-a,model-b"
}

Related Pages

Principle:Mlc_ai_Web_llm_Multi_Model_Routing -- The principle this implements
Implementation:Mlc_ai_Web_llm_Web_Worker_MLC_Engine_Handler -- Worker handler that routes to per-model engine methods
Implementation:Mlc_ai_Web_llm_Create_Web_Worker_MLC_Engine -- Proxy class that performs client-side routing
Implementation:Mlc_ai_Web_llm_Web_Worker_Chat_Completion -- Chat completion that uses getModelIdToUse()
Implementation:Mlc_ai_Web_llm_Async_Generate -- Streaming generator keyed by model ID

Principle:Mlc_ai_Web_llm_Multi_Model_Routing

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment