Principle:Mlc ai Web llm Tool Execution Loop
Overview
Tool Execution Loop is the pattern for executing LLM-requested tool calls and feeding results back into the conversation for multi-turn tool use. This is a user-implemented pattern (not built into the engine) where the application mediates between the model and external functions across multiple conversation turns.
Description
The tool execution loop is a conversational pattern that enables iterative interaction between a language model and external systems. It follows this cycle:
- Model returns tool_calls -- The chat completion response has
finish_reason: "tool_calls"andmessage.tool_callscontains an array ofChatCompletionMessageToolCallobjects. - Application executes each function -- The developer's code dispatches each tool call to the corresponding function implementation using
function.nameand the parsedfunction.arguments. - Results are formatted as tool messages -- Each function result is wrapped in a
ChatCompletionToolMessageParamwithrole: "tool", the result ascontent, and the matchingtool_call_idfrom the original tool call. - Conversation continues -- The messages array (including the assistant's tool call message, and the tool result messages) is sent back to the model for the next turn. The model can then synthesize a natural language response, or make additional tool calls.
This loop can repeat multiple times in a single conversation, enabling complex multi-step workflows where the model orchestrates multiple API calls, each building on the results of previous calls.
Usage
Implement this pattern when your application needs multi-turn tool use. The key requirements are:
Message formatting rules:
- After receiving a tool call response, append the assistant's message (with
tool_calls) to the messages array. - For each tool call result, create a
ChatCompletionToolMessageParamwith:role: "tool"content: string-- The result of the function execution (must be a string; serialize objects withJSON.stringify)tool_call_id: string-- Must match theidfield of the correspondingChatCompletionMessageToolCall
Loop termination:
- The loop ends when the model returns
finish_reason: "stop"instead of"tool_calls", indicating it has enough information to produce a final text response. - Applications should also implement a maximum iteration limit to prevent infinite loops.
Error handling:
- If a function execution fails, the error should be communicated back to the model as the tool message content so the model can adapt its response.
- If the model produces invalid tool calls (e.g., calling a non-existent function), the application should handle this gracefully.
Theoretical Basis
The tool execution loop implements the ReAct (Reasoning + Acting) pattern adapted for structured function calling:
- Reasoning -- The model analyzes the user query and available tools to determine which function(s) to call and with what arguments.
- Acting -- The application executes the chosen function(s) and returns results.
- Observation -- The model receives the function results as tool messages and reasons about them.
- Iteration -- The model may make additional tool calls or produce a final response.
This pattern is fundamental to building agentic applications because it allows the model to:
- Decompose complex queries into multiple function calls (e.g., first look up a stock symbol, then fetch its fundamentals)
- Chain function calls where the output of one call informs the input of the next
- Recover from errors by examining error messages and retrying with different arguments
- Synthesize results from multiple tool calls into a coherent natural language response
The conversation history serves as the model's working memory, providing full context of previous actions and results.
I/O Contract
Loop input (per iteration):
messages: Array<ChatCompletionMessageParam>-- The full conversation history including prior tool calls and resultstools: Array<ChatCompletionTool>-- Available tool definitions (same across iterations)tool_choice: ChatCompletionToolChoiceOption-- Typically"auto"for loop iterations
Loop output (per iteration):
- Either
finish_reason: "tool_calls"withmessage.tool_calls(continue loop), or finish_reason: "stop"withmessage.content(loop complete)
Tool result message structure:
interface ChatCompletionToolMessageParam {
content: string; // Function result as string
role: "tool"; // Literal "tool"
tool_call_id: string; // Must match ChatCompletionMessageToolCall.id
}
Conversation flow:
Turn 1: [system, user] -> assistant (tool_calls)
Turn 2: [system, user, assistant(tool_calls), tool(result1), tool(result2)] -> assistant (content)
Or for multi-round tool use:
Turn 1: [system, user] -> assistant (tool_calls)
Turn 2: [system, user, assistant, tool] -> assistant (tool_calls again)
Turn 3: [system, user, assistant, tool, assistant, tool] -> assistant (content)
Usage Examples
Complete tool execution loop with OpenAI-style tools:
import * as webllm from "@mlc-ai/web-llm";
// 1. Define tools
const tools: Array<webllm.ChatCompletionTool> = [
{
type: "function",
function: {
name: "get_current_weather",
description: "Get the current weather in a given location",
parameters: {
type: "object",
properties: {
location: {
type: "string",
description: "The city and state, e.g. San Francisco, CA",
},
unit: { type: "string", enum: ["celsius", "fahrenheit"] },
},
required: ["location"],
},
},
},
];
// 2. Implement the actual function
function get_current_weather(location: string, unit: string = "celsius"): string {
// In a real app, this would call a weather API
return JSON.stringify({ location, temperature: 22.5, unit });
}
// 3. Create engine
const engine = await webllm.CreateMLCEngine(
"Hermes-2-Pro-Llama-3-8B-q4f16_1-MLC",
);
// 4. Initial request
const messages: webllm.ChatCompletionMessageParam[] = [
{ role: "user", content: "What is the weather in Pittsburgh?" },
];
const reply = await engine.chat.completions.create({
stream: false,
messages: messages,
tools: tools,
tool_choice: "auto",
});
// 5. Check if model wants to call tools
if (reply.choices[0].finish_reason === "tool_calls") {
const toolCalls = reply.choices[0].message.tool_calls!;
// 5a. Add assistant message with tool_calls to history
messages.push({
role: "assistant",
content: null,
tool_calls: toolCalls,
});
// 5b. Execute each tool call and add results
for (const toolCall of toolCalls) {
const args = JSON.parse(toolCall.function.arguments);
let result: string;
if (toolCall.function.name === "get_current_weather") {
result = get_current_weather(args.location, args.unit);
} else {
result = JSON.stringify({ error: "Unknown function" });
}
messages.push({
role: "tool",
content: result,
tool_call_id: toolCall.id,
});
}
// 5c. Get final response with tool results
const finalReply = await engine.chat.completions.create({
stream: false,
messages: messages,
tools: tools,
tool_choice: "auto",
});
console.log(finalReply.choices[0].message.content);
// "The current weather in Pittsburgh is 22.5 degrees Celsius."
}
Multi-turn tool use (Hermes-2 manual style):
// From examples/function-calling/function-calling-manual
const messages: webllm.ChatCompletionMessageParam[] = [
{ role: "system", content: system_prompt },
{ role: "user", content: "Fetch the stock fundamentals data for Tesla (TSLA)" },
];
// Turn 1: Model generates tool call
const reply1 = await engine.chat.completions.create({
stream: false,
messages: messages,
});
messages.push({ role: "assistant", content: reply1.choices[0].message.content });
// Turn 2: Execute function and provide result
const toolResponse = JSON.stringify({
symbol: "TSLA",
company_name: "Tesla, Inc.",
sector: "Consumer Cyclical",
market_cap: 611384164352,
});
messages.push({ role: "tool", content: toolResponse, tool_call_id: "0" });
// Turn 3: Model synthesizes natural language response
const reply2 = await engine.chat.completions.create({
stream: false,
messages: messages,
});
console.log(reply2.choices[0].message.content);
// Natural language summary of Tesla's stock fundamentals
// Turn 4: User asks for another stock -- loop continues
messages.push({ role: "assistant", content: reply2.choices[0].message.content });
messages.push({
role: "user",
content: "Now do another one with NVIDIA, symbol being NVDA.",
});
const reply3 = await engine.chat.completions.create({
stream: false,
messages: messages,
});
// Model generates another tool call for NVDA
Generic tool execution loop with max iterations:
async function runToolLoop(
engine: webllm.MLCEngineInterface,
initialMessages: webllm.ChatCompletionMessageParam[],
tools: Array<webllm.ChatCompletionTool>,
executeTool: (name: string, args: Record<string, unknown>) => Promise<string>,
maxIterations: number = 5,
): Promise<string> {
const messages = [...initialMessages];
for (let i = 0; i < maxIterations; i++) {
const reply = await engine.chat.completions.create({
stream: false,
messages,
tools,
tool_choice: "auto",
});
const choice = reply.choices[0];
if (choice.finish_reason === "stop") {
return choice.message.content ?? "";
}
if (choice.finish_reason === "tool_calls" && choice.message.tool_calls) {
messages.push({
role: "assistant",
content: null,
tool_calls: choice.message.tool_calls,
});
for (const toolCall of choice.message.tool_calls) {
const args = JSON.parse(toolCall.function.arguments);
const result = await executeTool(toolCall.function.name, args);
messages.push({
role: "tool",
content: result,
tool_call_id: toolCall.id,
});
}
}
}
return "Max iterations reached without final response.";
}
Related Pages
- Implementation:Mlc_ai_Web_llm_Tool_Execution_Pattern
- Mlc_ai_Web_llm_Tool_Definition -- Defining the tools available in the loop
- Mlc_ai_Web_llm_Tool_Choice_Configuration -- Controlling tool invocation per iteration
- Mlc_ai_Web_llm_Tool_Call_Extraction -- How tool calls are parsed from model output
- Mlc_ai_Web_llm_Function_Calling_Model_Selection -- Required for reliable loop operation