Heuristic:Mlc ai Web llm Tokenizer JSON Preference

Knowledge Sources	mlc-ai/web-llm Source code analysis of cache_util.ts tokenizer loading
Domains	LLMs, Debugging, Structured_Output
Last Updated	2026-02-14 22:00 GMT

Overview

Best practice to ensure models include `tokenizer.json` rather than relying on the fallback `tokenizer.model`, which may miss token mappings critical for grammar-constrained decoding.

Description

WebLLM loads tokenizers in a priority order: `tokenizer.json` first, then `tokenizer.model` as fallback. The `tokenizer.json` format (HuggingFace Tokenizers library) preserves the complete token vocabulary including added tokens, special tokens, and all token-to-ID mappings. The `tokenizer.model` format (SentencePiece) may miss tokens from `added_tokens.json` and `tokenizer_config.json`, which can cause subtle issues in grammar-constrained decoding (JSON mode) where the GrammarMatcher needs the exact token table.

Usage

Use this heuristic when models produce incorrect JSON output in structured generation mode, when special tokens are not recognized, or when compiling custom models for WebLLM. Also relevant when the warning `"Using tokenizer.model since we cannot locate tokenizer.json"` appears in logs.

The Insight (Rule of Thumb)

Action: Ensure the model repository on HuggingFace includes a `tokenizer.json` file alongside or instead of `tokenizer.model`.
Value: Complete token table ensures grammar matcher operates correctly for JSON/structured output.
Trade-off: None when `tokenizer.json` is available. If only `tokenizer.model` exists, recompile the model with MLC to generate `tokenizer.json`.
Detection: Set `logLevel: "INFO"` to see which tokenizer format is loaded.

Reasoning

The GrammarMatcher in xgrammar needs the full token vocabulary to correctly determine which tokens are valid at each decoding step. If the token table is incomplete (missing special tokens or added tokens), the grammar matcher may allow tokens it should block or block tokens it should allow, leading to malformed JSON output or unexpected generation behavior. The `tokenizer.json` format is the canonical source of truth from the HuggingFace ecosystem.

Additionally, `token_postproc_method` configuration affects grammar accuracy. WebLLM has a fallback chain for this config: (1) `tokenizer_info` field (best), (2) `token_table_postproc_method` (deprecated), (3) default `raw` with warning. Missing this config degrades JSON mode accuracy.

Tokenizer loading fallback from `src/cache_util.ts:132-148`:

if (config.tokenizer_files.includes("tokenizer.json")) {
  const url = new URL("tokenizer.json", baseUrl).href;
  const model = await modelCache.fetchWithCache(url, "arraybuffer");
  return Tokenizer.fromJSON(model);
} else if (config.tokenizer_files.includes("tokenizer.model")) {
  logger(
    "Using `tokenizer.model` since we cannot locate `tokenizer.json`.\n" +
      "It is recommended to use `tokenizer.json` to ensure all token " +
      "mappings are included, since currently, files like " +
      "`added_tokens.json`, `tokenizer_config.json` are ignored.\n" +
      "Consider converting `tokenizer.model` to `tokenizer.json` by " +
      "compiling the model with MLC again.",
  );
  const url = new URL("tokenizer.model", baseUrl).href;
  const model = await modelCache.fetchWithCache(url, "arraybuffer");
  return Tokenizer.fromSentencePiece(model);
}

Token postproc method fallback from `src/llm_chat.ts:173-192`:

if (config.tokenizer_info !== undefined) {
  this.token_postproc_method = config.tokenizer_info.token_postproc_method;
  this.prepend_space_in_encode =
    config.tokenizer_info.prepend_space_in_encode;
} else if (config.token_table_postproc_method !== undefined) {
  this.token_postproc_method = config.token_table_postproc_method;
  this.prepend_space_in_encode = false;
} else {
  log.warn(
    "Cannot find `tokenizer_info` or `token_table_postproc_method` " +
      "in `mlc-chat-config.json`, using default token_postproc_method `raw`.\n" +
      "This field is only used for json mode.",
  );
  this.token_postproc_method = "raw";
  this.prepend_space_in_encode = false;
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment