Principle:Mlc ai Web llm Function Calling Model Selection
Overview
Function Calling Model Selection is the technique of selecting language models that support structured function calling output with reliable tool invocation. Not all LLMs can reliably generate structured tool call outputs; web-llm explicitly validates model compatibility and only allows function calling with models from a curated allowlist.
Description
Not all language models reliably generate structured tool call outputs. Function calling requires models specifically trained or fine-tuned for this capability. In web-llm, two model families are validated for function calling:
- Hermes-2-Pro -- Models based on the NousResearch Hermes-2-Pro architecture (Llama-3-8B and Mistral-7B variants). These use an XML-tagged system prompt format with
<tools></tools>tags and output JSON-formatted function calls. - Hermes-3 -- Models based on the Hermes-3-Llama-3.1 architecture, which also support the Hermes function calling prompt format.
These models use a specific system prompt format that:
- Identifies the model as a function calling AI model
- Provides tool definitions wrapped in
<tools></tools>XML tags - Specifies the JSON schema for function call output
- Instructs the model to return JSON objects with
nameandargumentsfields
When a user provides tools in a chat completion request, the engine checks the current model against the functionCallingModelIds list. If the model is not in this list, an UnsupportedModelIdError is thrown.
Usage
Use this principle when building tool-use applications. Always select a model from the functionCallingModelIds list for reliable function calling.
Selection criteria:
- Hermes-2-Pro-Llama-3-8B -- Best general-purpose choice; available in q4f16_1 (smaller) and q4f32_1 (higher quality) quantizations
- Hermes-2-Pro-Mistral-7B -- Alternative base model; available in q4f16_1 quantization
- Hermes-3-Llama-3.1-8B -- Newer architecture; available in q4f32_1 and q4f16_1 quantizations
Quantization trade-offs:
q4f16_1-- Smaller model size, faster loading, slightly lower precisionq4f32_1-- Larger model size, higher precision for computation
Important: Other models (e.g., Llama-3.1-8B-Instruct) may support function calling through manual system prompt engineering (as shown in the manual function calling example), but they are not validated by the engine's automatic tool handling pipeline and require the user to manage prompt formatting and output parsing themselves.
Theoretical Basis
Function calling model selection is grounded in the observation that structured output generation is a specialized capability. Standard language models are trained to produce free-form text, which may coincidentally resemble structured formats but lacks reliability guarantees.
Models trained for function calling undergo specific alignment:
- Format adherence -- The model learns to output valid JSON conforming to a schema rather than free text.
- Tool selection reasoning -- The model learns to map user intent to the most appropriate tool from a provided set.
- Argument extraction -- The model learns to extract relevant values from natural language and map them to typed function parameters.
The Hermes-2-Pro models follow the format documented at the NousResearch Hermes-2-Pro repository, where the system prompt uses a pydantic-style JSON schema to define the expected function call format:
{
"properties": {
"arguments": {"title": "Arguments", "type": "object"},
"name": {"title": "Name", "type": "string"}
},
"required": ["arguments", "name"],
"title": "FunctionCall",
"type": "object"
}
The engine enforces this at the grammar level by setting response_format to json_object with the schema, ensuring the model cannot produce output that deviates from the expected structure.
I/O Contract
Input:
- A model identifier string (e.g.,
"Hermes-2-Pro-Llama-3-8B-q4f16_1-MLC") selected by the user when creating an engine.
Validation:
- When
request.toolsis notundefinedornull, the engine checks:functionCallingModelIds.includes(currentModelId). - If the check fails,
UnsupportedModelIdErroris thrown listing supported models.
Output:
- A properly configured engine capable of processing tool definitions and producing structured tool call responses.
Usage Examples
Selecting a function calling model:
import * as webllm from "@mlc-ai/web-llm";
// Select a model from the validated function calling list
const selectedModel = "Hermes-2-Pro-Llama-3-8B-q4f16_1-MLC";
const engine = await webllm.CreateMLCEngine(selectedModel, {
initProgressCallback: (report) => {
console.log("Loading:", report.text);
},
});
Checking if a model supports function calling:
import { functionCallingModelIds } from "@mlc-ai/web-llm";
const modelId = "Hermes-2-Pro-Llama-3-8B-q4f16_1-MLC";
if (functionCallingModelIds.includes(modelId)) {
console.log("Model supports function calling");
} else {
console.log("Model does NOT support function calling");
}
Error when using unsupported model with tools:
// This will throw UnsupportedModelIdError
const engine = await webllm.CreateMLCEngine(
"Llama-3.1-8B-Instruct-q4f16_1-MLC",
);
const request: webllm.ChatCompletionRequest = {
messages: [{ role: "user", content: "What is the weather?" }],
tools: [
{
type: "function",
function: {
name: "get_weather",
description: "Get weather data",
parameters: { type: "object", properties: {} },
},
},
],
};
// Throws: UnsupportedModelIdError listing functionCallingModelIds
const reply = await engine.chat.completions.create(request);
Related Pages
- Implementation:Mlc_ai_Web_llm_Function_Calling_Model_Ids
- Mlc_ai_Web_llm_Tool_Definition -- Declaring tools for the selected model to invoke
- Mlc_ai_Web_llm_Tool_Choice_Configuration -- Controlling tool invocation behavior
- Mlc_ai_Web_llm_Tool_Call_Extraction -- Parsing tool calls from validated model output