Principle:Mlc ai Web llm Tool Execution Loop

Overview

Tool Execution Loop is the pattern for executing LLM-requested tool calls and feeding results back into the conversation for multi-turn tool use. This is a user-implemented pattern (not built into the engine) where the application mediates between the model and external functions across multiple conversation turns.

Description

The tool execution loop is a conversational pattern that enables iterative interaction between a language model and external systems. It follows this cycle:

Model returns tool_calls -- The chat completion response has finish_reason: "tool_calls" and message.tool_calls contains an array of ChatCompletionMessageToolCall objects.
Application executes each function -- The developer's code dispatches each tool call to the corresponding function implementation using function.name and the parsed function.arguments.
Results are formatted as tool messages -- Each function result is wrapped in a ChatCompletionToolMessageParam with role: "tool", the result as content, and the matching tool_call_id from the original tool call.
Conversation continues -- The messages array (including the assistant's tool call message, and the tool result messages) is sent back to the model for the next turn. The model can then synthesize a natural language response, or make additional tool calls.

This loop can repeat multiple times in a single conversation, enabling complex multi-step workflows where the model orchestrates multiple API calls, each building on the results of previous calls.

Usage

Implement this pattern when your application needs multi-turn tool use. The key requirements are:

Message formatting rules:

After receiving a tool call response, append the assistant's message (with tool_calls) to the messages array.
For each tool call result, create a ChatCompletionToolMessageParam with:
- role: "tool"
- content: string -- The result of the function execution (must be a string; serialize objects with JSON.stringify)
- tool_call_id: string -- Must match the id field of the corresponding ChatCompletionMessageToolCall

Loop termination:

The loop ends when the model returns finish_reason: "stop" instead of "tool_calls", indicating it has enough information to produce a final text response.
Applications should also implement a maximum iteration limit to prevent infinite loops.

Error handling:

If a function execution fails, the error should be communicated back to the model as the tool message content so the model can adapt its response.
If the model produces invalid tool calls (e.g., calling a non-existent function), the application should handle this gracefully.

Theoretical Basis

The tool execution loop implements the ReAct (Reasoning + Acting) pattern adapted for structured function calling:

Reasoning -- The model analyzes the user query and available tools to determine which function(s) to call and with what arguments.
Acting -- The application executes the chosen function(s) and returns results.
Observation -- The model receives the function results as tool messages and reasons about them.
Iteration -- The model may make additional tool calls or produce a final response.

This pattern is fundamental to building agentic applications because it allows the model to:

Decompose complex queries into multiple function calls (e.g., first look up a stock symbol, then fetch its fundamentals)
Chain function calls where the output of one call informs the input of the next
Recover from errors by examining error messages and retrying with different arguments
Synthesize results from multiple tool calls into a coherent natural language response

The conversation history serves as the model's working memory, providing full context of previous actions and results.

I/O Contract

Loop input (per iteration):

messages: Array<ChatCompletionMessageParam> -- The full conversation history including prior tool calls and results
tools: Array<ChatCompletionTool> -- Available tool definitions (same across iterations)
tool_choice: ChatCompletionToolChoiceOption -- Typically "auto" for loop iterations

Loop output (per iteration):

Either finish_reason: "tool_calls" with message.tool_calls (continue loop), or
finish_reason: "stop" with message.content (loop complete)

Tool result message structure:

interface ChatCompletionToolMessageParam {
  content: string;        // Function result as string
  role: "tool";           // Literal "tool"
  tool_call_id: string;   // Must match ChatCompletionMessageToolCall.id
}

Conversation flow:

Turn 1: [system, user] -> assistant (tool_calls)
Turn 2: [system, user, assistant(tool_calls), tool(result1), tool(result2)] -> assistant (content)

Or for multi-round tool use:

Turn 1: [system, user] -> assistant (tool_calls)
Turn 2: [system, user, assistant, tool] -> assistant (tool_calls again)
Turn 3: [system, user, assistant, tool, assistant, tool] -> assistant (content)

Usage Examples

Complete tool execution loop with OpenAI-style tools:

import * as webllm from "@mlc-ai/web-llm";

// 1. Define tools
const tools: Array<webllm.ChatCompletionTool> = [
  {
    type: "function",
    function: {
      name: "get_current_weather",
      description: "Get the current weather in a given location",
      parameters: {
        type: "object",
        properties: {
          location: {
            type: "string",
            description: "The city and state, e.g. San Francisco, CA",
          },
          unit: { type: "string", enum: ["celsius", "fahrenheit"] },
        },
        required: ["location"],
      },
    },
  },
];

// 2. Implement the actual function
function get_current_weather(location: string, unit: string = "celsius"): string {
  // In a real app, this would call a weather API
  return JSON.stringify({ location, temperature: 22.5, unit });
}

// 3. Create engine
const engine = await webllm.CreateMLCEngine(
  "Hermes-2-Pro-Llama-3-8B-q4f16_1-MLC",
);

// 4. Initial request
const messages: webllm.ChatCompletionMessageParam[] = [
  { role: "user", content: "What is the weather in Pittsburgh?" },
];

const reply = await engine.chat.completions.create({
  stream: false,
  messages: messages,
  tools: tools,
  tool_choice: "auto",
});

// 5. Check if model wants to call tools
if (reply.choices[0].finish_reason === "tool_calls") {
  const toolCalls = reply.choices[0].message.tool_calls!;

  // 5a. Add assistant message with tool_calls to history
  messages.push({
    role: "assistant",
    content: null,
    tool_calls: toolCalls,
  });

  // 5b. Execute each tool call and add results
  for (const toolCall of toolCalls) {
    const args = JSON.parse(toolCall.function.arguments);
    let result: string;

    if (toolCall.function.name === "get_current_weather") {
      result = get_current_weather(args.location, args.unit);
    } else {
      result = JSON.stringify({ error: "Unknown function" });
    }

    messages.push({
      role: "tool",
      content: result,
      tool_call_id: toolCall.id,
    });
  }

  // 5c. Get final response with tool results
  const finalReply = await engine.chat.completions.create({
    stream: false,
    messages: messages,
    tools: tools,
    tool_choice: "auto",
  });

  console.log(finalReply.choices[0].message.content);
  // "The current weather in Pittsburgh is 22.5 degrees Celsius."
}

Multi-turn tool use (Hermes-2 manual style):

// From examples/function-calling/function-calling-manual
const messages: webllm.ChatCompletionMessageParam[] = [
  { role: "system", content: system_prompt },
  { role: "user", content: "Fetch the stock fundamentals data for Tesla (TSLA)" },
];

// Turn 1: Model generates tool call
const reply1 = await engine.chat.completions.create({
  stream: false,
  messages: messages,
});
messages.push({ role: "assistant", content: reply1.choices[0].message.content });

// Turn 2: Execute function and provide result
const toolResponse = JSON.stringify({
  symbol: "TSLA",
  company_name: "Tesla, Inc.",
  sector: "Consumer Cyclical",
  market_cap: 611384164352,
});
messages.push({ role: "tool", content: toolResponse, tool_call_id: "0" });

// Turn 3: Model synthesizes natural language response
const reply2 = await engine.chat.completions.create({
  stream: false,
  messages: messages,
});
console.log(reply2.choices[0].message.content);
// Natural language summary of Tesla's stock fundamentals

// Turn 4: User asks for another stock -- loop continues
messages.push({ role: "assistant", content: reply2.choices[0].message.content });
messages.push({
  role: "user",
  content: "Now do another one with NVIDIA, symbol being NVDA.",
});

const reply3 = await engine.chat.completions.create({
  stream: false,
  messages: messages,
});
// Model generates another tool call for NVDA

Generic tool execution loop with max iterations:

async function runToolLoop(
  engine: webllm.MLCEngineInterface,
  initialMessages: webllm.ChatCompletionMessageParam[],
  tools: Array<webllm.ChatCompletionTool>,
  executeTool: (name: string, args: Record<string, unknown>) => Promise<string>,
  maxIterations: number = 5,
): Promise<string> {
  const messages = [...initialMessages];

  for (let i = 0; i < maxIterations; i++) {
    const reply = await engine.chat.completions.create({
      stream: false,
      messages,
      tools,
      tool_choice: "auto",
    });

    const choice = reply.choices[0];

    if (choice.finish_reason === "stop") {
      return choice.message.content ?? "";
    }

    if (choice.finish_reason === "tool_calls" && choice.message.tool_calls) {
      messages.push({
        role: "assistant",
        content: null,
        tool_calls: choice.message.tool_calls,
      });

      for (const toolCall of choice.message.tool_calls) {
        const args = JSON.parse(toolCall.function.arguments);
        const result = await executeTool(toolCall.function.name, args);
        messages.push({
          role: "tool",
          content: result,
          tool_call_id: toolCall.id,
        });
      }
    }
  }

  return "Max iterations reached without final response.";
}

Related Pages

Implementation:Mlc_ai_Web_llm_Tool_Execution_Pattern
Mlc_ai_Web_llm_Tool_Definition -- Defining the tools available in the loop
Mlc_ai_Web_llm_Tool_Choice_Configuration -- Controlling tool invocation per iteration
Mlc_ai_Web_llm_Tool_Call_Extraction -- How tool calls are parsed from model output
Mlc_ai_Web_llm_Function_Calling_Model_Selection -- Required for reliable loop operation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment