Principle:Mlc ai Web llm Tool Call Extraction
Overview
Tool Call Extraction is the process of parsing structured function call information from language model output text into typed tool call objects. After the model generates a response containing tool calls, the raw JSON output must be validated, parsed, and formatted into the standardized tool_calls array used by the OpenAI-compatible API.
Description
When a function-calling-capable model generates a response with tools provided, it outputs a JSON array of objects, each containing a name (the function to call) and arguments (the parameters to pass). The tool call extraction process transforms this raw text into typed ChatCompletionMessageToolCall objects (for non-streaming) or ChatCompletionChunk.Choice.Delta.ToolCall objects (for streaming).
The extraction pipeline performs three critical steps:
- JSON parsing -- The raw output message string is parsed as JSON. If parsing fails, a
ToolCallOutputParseErroris thrown with the original message and underlying error. - Type validation -- The parsed result is verified to be an array. If not, a
ToolCallOutputInvalidTypeErroris thrown. - Field validation and formatting -- Each element in the array is checked for the required
nameandargumentsfields. If either is missing, aToolCallOutputMissingFieldsErroris thrown. Theargumentsobject is re-serialized to a JSON string (matching the OpenAI convention where arguments are a string, not an object).
After extraction, the engine sets finish_reason to "tool_calls" (instead of "stop") to signal that the response contains tool invocations rather than text content.
Usage
Tool call extraction happens automatically inside the engine's chatCompletion method. Application code does not call the extraction function directly. The parsed tool calls are available in the response:
Non-streaming:
const reply = await engine.chat.completions.create(request);
const toolCalls = reply.choices[0].message.tool_calls;
// Array<ChatCompletionMessageToolCall> | undefined
Streaming:
const stream = await engine.chat.completions.create({ ...request, stream: true });
let lastChunk;
for await (const chunk of stream) {
lastChunk = chunk;
}
const toolCalls = lastChunk.choices[0].delta.tool_calls;
// Array<ChatCompletionChunk.Choice.Delta.ToolCall> | undefined
When extraction occurs:
- Only when
request.toolsis defined and not null - Only when
finish_reasonis"stop"(successful completion). If generation terminated due to"length"or"abort", tool calls are not extracted (incomplete output cannot be reliably parsed).
Error handling: Applications should be prepared for extraction errors. If the model produces malformed JSON (unlikely with grammar-constrained output but possible), the errors propagate as specific typed exceptions.
Theoretical Basis
Tool call extraction bridges the gap between the model's text generation capability and the application's need for structured data. The process relies on several key design decisions:
- JSON as interchange format -- Function calls use JSON because it is well-defined, widely supported, and can be grammar-constrained during generation. The model's output is constrained by the
officialHermes2FunctionCallSchemaArrayJSON schema. - String-typed arguments -- Following the OpenAI convention,
argumentsis stored as a JSON string rather than a parsed object. This preserves the exact serialization from the model and defers deserialization to the application, which knows the expected parameter types. - ID assignment -- Each tool call receives an
idfield (the array index as a string for non-streaming, or anindexfield for streaming). This ID is used to correlate tool results back to specific calls in multi-tool scenarios. - Streaming vs. non-streaming -- The extraction function is overloaded to produce different output types based on the streaming context:
- Non-streaming:
ChatCompletionMessageToolCallwithid: string - Streaming:
ChatCompletionChunk.Choice.Delta.ToolCallwithindex: number
- Non-streaming:
The extraction runs only when the model stops normally (finish_reason === "stop"). This prevents parsing truncated JSON from length-limited or aborted generations.
I/O Contract
Input:
- Raw model output string (expected to be a valid JSON array of
{"name": string, "arguments": object}objects) - Boolean flag indicating streaming or non-streaming mode
Output (non-streaming):
Array<ChatCompletionMessageToolCall>where each element has:id: string-- Array index as string (e.g.,"0","1")function.name: string-- The function namefunction.arguments: string-- JSON-serialized argumentstype: "function"-- Literal discriminator
Output (streaming):
Array<ChatCompletionChunk.Choice.Delta.ToolCall>where each element has:index: number-- Array index as numberfunction.name: string-- The function namefunction.arguments: string-- JSON-serialized argumentstype: "function"-- Literal discriminator
Error conditions:
| Error Type | Condition | Description |
|---|---|---|
ToolCallOutputParseError |
JSON.parse fails | Output is not valid JSON |
ToolCallOutputInvalidTypeError |
Parsed result is not an array | Output is valid JSON but not an array |
ToolCallOutputMissingFieldsError |
Element missing name or arguments |
Array element lacks required fields |
Usage Examples
Accessing tool calls from a non-streaming response:
import * as webllm from "@mlc-ai/web-llm";
const reply = await engine.chat.completions.create({
stream: false,
messages: [
{ role: "user", content: "What is the weather in Pittsburgh and Tokyo?" },
],
tools: tools,
tool_choice: "auto",
});
if (reply.choices[0].finish_reason === "tool_calls") {
const toolCalls = reply.choices[0].message.tool_calls!;
for (const call of toolCalls) {
console.log(`Tool call ID: ${call.id}`);
console.log(`Function: ${call.function.name}`);
console.log(`Arguments: ${call.function.arguments}`);
// Arguments is a JSON string, parse it:
const args = JSON.parse(call.function.arguments);
console.log(`Parsed location: ${args.location}`);
}
}
Accessing tool calls from a streaming response:
const stream = await engine.chat.completions.create({
stream: true,
stream_options: { include_usage: true },
messages: [
{ role: "user", content: "What is the weather in Pittsburgh and Tokyo?" },
],
tools: tools,
tool_choice: "auto",
});
let lastChunk: webllm.ChatCompletionChunk | undefined;
for await (const chunk of stream) {
if (!chunk.usage) {
lastChunk = chunk;
}
}
// The last non-usage chunk contains tool_calls in its delta
if (lastChunk) {
const delta = lastChunk.choices[0].delta;
if (delta.tool_calls) {
for (const call of delta.tool_calls) {
console.log(`Index: ${call.index}`);
console.log(`Function: ${call.function?.name}`);
console.log(`Arguments: ${call.function?.arguments}`);
}
}
}
What the model output looks like before extraction:
[
{
"name": "get_current_weather",
"arguments": {
"location": "Pittsburgh, PA",
"unit": "celsius"
}
},
{
"name": "get_current_weather",
"arguments": {
"location": "Tokyo, Japan",
"unit": "celsius"
}
}
]
This gets transformed into ChatCompletionMessageToolCall objects where arguments becomes a JSON string.
Related Pages
- Implementation:Mlc_ai_Web_llm_Get_Tool_Call_From_Output
- Mlc_ai_Web_llm_Tool_Definition -- Tool definitions that drive the model's output
- Mlc_ai_Web_llm_Tool_Choice_Configuration -- Controls whether extraction occurs
- Mlc_ai_Web_llm_Function_Calling_Model_Selection -- Model must be validated for reliable extraction
- Mlc_ai_Web_llm_Tool_Execution_Loop -- What happens after tool calls are extracted