Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Mlc ai Web llm Function Calling Model Selection

From Leeroopedia

Template:Knowledge

Overview

Function Calling Model Selection is the technique of selecting language models that support structured function calling output with reliable tool invocation. Not all LLMs can reliably generate structured tool call outputs; web-llm explicitly validates model compatibility and only allows function calling with models from a curated allowlist.

Description

Not all language models reliably generate structured tool call outputs. Function calling requires models specifically trained or fine-tuned for this capability. In web-llm, two model families are validated for function calling:

  • Hermes-2-Pro -- Models based on the NousResearch Hermes-2-Pro architecture (Llama-3-8B and Mistral-7B variants). These use an XML-tagged system prompt format with <tools></tools> tags and output JSON-formatted function calls.
  • Hermes-3 -- Models based on the Hermes-3-Llama-3.1 architecture, which also support the Hermes function calling prompt format.

These models use a specific system prompt format that:

  1. Identifies the model as a function calling AI model
  2. Provides tool definitions wrapped in <tools></tools> XML tags
  3. Specifies the JSON schema for function call output
  4. Instructs the model to return JSON objects with name and arguments fields

When a user provides tools in a chat completion request, the engine checks the current model against the functionCallingModelIds list. If the model is not in this list, an UnsupportedModelIdError is thrown.

Usage

Use this principle when building tool-use applications. Always select a model from the functionCallingModelIds list for reliable function calling.

Selection criteria:

  • Hermes-2-Pro-Llama-3-8B -- Best general-purpose choice; available in q4f16_1 (smaller) and q4f32_1 (higher quality) quantizations
  • Hermes-2-Pro-Mistral-7B -- Alternative base model; available in q4f16_1 quantization
  • Hermes-3-Llama-3.1-8B -- Newer architecture; available in q4f32_1 and q4f16_1 quantizations

Quantization trade-offs:

  • q4f16_1 -- Smaller model size, faster loading, slightly lower precision
  • q4f32_1 -- Larger model size, higher precision for computation

Important: Other models (e.g., Llama-3.1-8B-Instruct) may support function calling through manual system prompt engineering (as shown in the manual function calling example), but they are not validated by the engine's automatic tool handling pipeline and require the user to manage prompt formatting and output parsing themselves.

Theoretical Basis

Function calling model selection is grounded in the observation that structured output generation is a specialized capability. Standard language models are trained to produce free-form text, which may coincidentally resemble structured formats but lacks reliability guarantees.

Models trained for function calling undergo specific alignment:

  1. Format adherence -- The model learns to output valid JSON conforming to a schema rather than free text.
  2. Tool selection reasoning -- The model learns to map user intent to the most appropriate tool from a provided set.
  3. Argument extraction -- The model learns to extract relevant values from natural language and map them to typed function parameters.

The Hermes-2-Pro models follow the format documented at the NousResearch Hermes-2-Pro repository, where the system prompt uses a pydantic-style JSON schema to define the expected function call format:

{
  "properties": {
    "arguments": {"title": "Arguments", "type": "object"},
    "name": {"title": "Name", "type": "string"}
  },
  "required": ["arguments", "name"],
  "title": "FunctionCall",
  "type": "object"
}

The engine enforces this at the grammar level by setting response_format to json_object with the schema, ensuring the model cannot produce output that deviates from the expected structure.

I/O Contract

Input:

  • A model identifier string (e.g., "Hermes-2-Pro-Llama-3-8B-q4f16_1-MLC") selected by the user when creating an engine.

Validation:

  • When request.tools is not undefined or null, the engine checks: functionCallingModelIds.includes(currentModelId).
  • If the check fails, UnsupportedModelIdError is thrown listing supported models.

Output:

  • A properly configured engine capable of processing tool definitions and producing structured tool call responses.

Usage Examples

Selecting a function calling model:

import * as webllm from "@mlc-ai/web-llm";

// Select a model from the validated function calling list
const selectedModel = "Hermes-2-Pro-Llama-3-8B-q4f16_1-MLC";

const engine = await webllm.CreateMLCEngine(selectedModel, {
  initProgressCallback: (report) => {
    console.log("Loading:", report.text);
  },
});

Checking if a model supports function calling:

import { functionCallingModelIds } from "@mlc-ai/web-llm";

const modelId = "Hermes-2-Pro-Llama-3-8B-q4f16_1-MLC";

if (functionCallingModelIds.includes(modelId)) {
  console.log("Model supports function calling");
} else {
  console.log("Model does NOT support function calling");
}

Error when using unsupported model with tools:

// This will throw UnsupportedModelIdError
const engine = await webllm.CreateMLCEngine(
  "Llama-3.1-8B-Instruct-q4f16_1-MLC",
);

const request: webllm.ChatCompletionRequest = {
  messages: [{ role: "user", content: "What is the weather?" }],
  tools: [
    {
      type: "function",
      function: {
        name: "get_weather",
        description: "Get weather data",
        parameters: { type: "object", properties: {} },
      },
    },
  ],
};

// Throws: UnsupportedModelIdError listing functionCallingModelIds
const reply = await engine.chat.completions.create(request);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment