Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Mlc ai Web llm Tool Execution Loop

From Leeroopedia

Template:Knowledge

Overview

Tool Execution Loop is the pattern for executing LLM-requested tool calls and feeding results back into the conversation for multi-turn tool use. This is a user-implemented pattern (not built into the engine) where the application mediates between the model and external functions across multiple conversation turns.

Description

The tool execution loop is a conversational pattern that enables iterative interaction between a language model and external systems. It follows this cycle:

  1. Model returns tool_calls -- The chat completion response has finish_reason: "tool_calls" and message.tool_calls contains an array of ChatCompletionMessageToolCall objects.
  2. Application executes each function -- The developer's code dispatches each tool call to the corresponding function implementation using function.name and the parsed function.arguments.
  3. Results are formatted as tool messages -- Each function result is wrapped in a ChatCompletionToolMessageParam with role: "tool", the result as content, and the matching tool_call_id from the original tool call.
  4. Conversation continues -- The messages array (including the assistant's tool call message, and the tool result messages) is sent back to the model for the next turn. The model can then synthesize a natural language response, or make additional tool calls.

This loop can repeat multiple times in a single conversation, enabling complex multi-step workflows where the model orchestrates multiple API calls, each building on the results of previous calls.

Usage

Implement this pattern when your application needs multi-turn tool use. The key requirements are:

Message formatting rules:

  • After receiving a tool call response, append the assistant's message (with tool_calls) to the messages array.
  • For each tool call result, create a ChatCompletionToolMessageParam with:
    • role: "tool"
    • content: string -- The result of the function execution (must be a string; serialize objects with JSON.stringify)
    • tool_call_id: string -- Must match the id field of the corresponding ChatCompletionMessageToolCall

Loop termination:

  • The loop ends when the model returns finish_reason: "stop" instead of "tool_calls", indicating it has enough information to produce a final text response.
  • Applications should also implement a maximum iteration limit to prevent infinite loops.

Error handling:

  • If a function execution fails, the error should be communicated back to the model as the tool message content so the model can adapt its response.
  • If the model produces invalid tool calls (e.g., calling a non-existent function), the application should handle this gracefully.

Theoretical Basis

The tool execution loop implements the ReAct (Reasoning + Acting) pattern adapted for structured function calling:

  1. Reasoning -- The model analyzes the user query and available tools to determine which function(s) to call and with what arguments.
  2. Acting -- The application executes the chosen function(s) and returns results.
  3. Observation -- The model receives the function results as tool messages and reasons about them.
  4. Iteration -- The model may make additional tool calls or produce a final response.

This pattern is fundamental to building agentic applications because it allows the model to:

  • Decompose complex queries into multiple function calls (e.g., first look up a stock symbol, then fetch its fundamentals)
  • Chain function calls where the output of one call informs the input of the next
  • Recover from errors by examining error messages and retrying with different arguments
  • Synthesize results from multiple tool calls into a coherent natural language response

The conversation history serves as the model's working memory, providing full context of previous actions and results.

I/O Contract

Loop input (per iteration):

  • messages: Array<ChatCompletionMessageParam> -- The full conversation history including prior tool calls and results
  • tools: Array<ChatCompletionTool> -- Available tool definitions (same across iterations)
  • tool_choice: ChatCompletionToolChoiceOption -- Typically "auto" for loop iterations

Loop output (per iteration):

  • Either finish_reason: "tool_calls" with message.tool_calls (continue loop), or
  • finish_reason: "stop" with message.content (loop complete)

Tool result message structure:

interface ChatCompletionToolMessageParam {
  content: string;        // Function result as string
  role: "tool";           // Literal "tool"
  tool_call_id: string;   // Must match ChatCompletionMessageToolCall.id
}

Conversation flow:

Turn 1: [system, user] -> assistant (tool_calls)
Turn 2: [system, user, assistant(tool_calls), tool(result1), tool(result2)] -> assistant (content)

Or for multi-round tool use:

Turn 1: [system, user] -> assistant (tool_calls)
Turn 2: [system, user, assistant, tool] -> assistant (tool_calls again)
Turn 3: [system, user, assistant, tool, assistant, tool] -> assistant (content)

Usage Examples

Complete tool execution loop with OpenAI-style tools:

import * as webllm from "@mlc-ai/web-llm";

// 1. Define tools
const tools: Array<webllm.ChatCompletionTool> = [
  {
    type: "function",
    function: {
      name: "get_current_weather",
      description: "Get the current weather in a given location",
      parameters: {
        type: "object",
        properties: {
          location: {
            type: "string",
            description: "The city and state, e.g. San Francisco, CA",
          },
          unit: { type: "string", enum: ["celsius", "fahrenheit"] },
        },
        required: ["location"],
      },
    },
  },
];

// 2. Implement the actual function
function get_current_weather(location: string, unit: string = "celsius"): string {
  // In a real app, this would call a weather API
  return JSON.stringify({ location, temperature: 22.5, unit });
}

// 3. Create engine
const engine = await webllm.CreateMLCEngine(
  "Hermes-2-Pro-Llama-3-8B-q4f16_1-MLC",
);

// 4. Initial request
const messages: webllm.ChatCompletionMessageParam[] = [
  { role: "user", content: "What is the weather in Pittsburgh?" },
];

const reply = await engine.chat.completions.create({
  stream: false,
  messages: messages,
  tools: tools,
  tool_choice: "auto",
});

// 5. Check if model wants to call tools
if (reply.choices[0].finish_reason === "tool_calls") {
  const toolCalls = reply.choices[0].message.tool_calls!;

  // 5a. Add assistant message with tool_calls to history
  messages.push({
    role: "assistant",
    content: null,
    tool_calls: toolCalls,
  });

  // 5b. Execute each tool call and add results
  for (const toolCall of toolCalls) {
    const args = JSON.parse(toolCall.function.arguments);
    let result: string;

    if (toolCall.function.name === "get_current_weather") {
      result = get_current_weather(args.location, args.unit);
    } else {
      result = JSON.stringify({ error: "Unknown function" });
    }

    messages.push({
      role: "tool",
      content: result,
      tool_call_id: toolCall.id,
    });
  }

  // 5c. Get final response with tool results
  const finalReply = await engine.chat.completions.create({
    stream: false,
    messages: messages,
    tools: tools,
    tool_choice: "auto",
  });

  console.log(finalReply.choices[0].message.content);
  // "The current weather in Pittsburgh is 22.5 degrees Celsius."
}

Multi-turn tool use (Hermes-2 manual style):

// From examples/function-calling/function-calling-manual
const messages: webllm.ChatCompletionMessageParam[] = [
  { role: "system", content: system_prompt },
  { role: "user", content: "Fetch the stock fundamentals data for Tesla (TSLA)" },
];

// Turn 1: Model generates tool call
const reply1 = await engine.chat.completions.create({
  stream: false,
  messages: messages,
});
messages.push({ role: "assistant", content: reply1.choices[0].message.content });

// Turn 2: Execute function and provide result
const toolResponse = JSON.stringify({
  symbol: "TSLA",
  company_name: "Tesla, Inc.",
  sector: "Consumer Cyclical",
  market_cap: 611384164352,
});
messages.push({ role: "tool", content: toolResponse, tool_call_id: "0" });

// Turn 3: Model synthesizes natural language response
const reply2 = await engine.chat.completions.create({
  stream: false,
  messages: messages,
});
console.log(reply2.choices[0].message.content);
// Natural language summary of Tesla's stock fundamentals

// Turn 4: User asks for another stock -- loop continues
messages.push({ role: "assistant", content: reply2.choices[0].message.content });
messages.push({
  role: "user",
  content: "Now do another one with NVIDIA, symbol being NVDA.",
});

const reply3 = await engine.chat.completions.create({
  stream: false,
  messages: messages,
});
// Model generates another tool call for NVDA

Generic tool execution loop with max iterations:

async function runToolLoop(
  engine: webllm.MLCEngineInterface,
  initialMessages: webllm.ChatCompletionMessageParam[],
  tools: Array<webllm.ChatCompletionTool>,
  executeTool: (name: string, args: Record<string, unknown>) => Promise<string>,
  maxIterations: number = 5,
): Promise<string> {
  const messages = [...initialMessages];

  for (let i = 0; i < maxIterations; i++) {
    const reply = await engine.chat.completions.create({
      stream: false,
      messages,
      tools,
      tool_choice: "auto",
    });

    const choice = reply.choices[0];

    if (choice.finish_reason === "stop") {
      return choice.message.content ?? "";
    }

    if (choice.finish_reason === "tool_calls" && choice.message.tool_calls) {
      messages.push({
        role: "assistant",
        content: null,
        tool_calls: choice.message.tool_calls,
      });

      for (const toolCall of choice.message.tool_calls) {
        const args = JSON.parse(toolCall.function.arguments);
        const result = await executeTool(toolCall.function.name, args);
        messages.push({
          role: "tool",
          content: result,
          tool_call_id: toolCall.id,
        });
      }
    }
  }

  return "Max iterations reached without final response.";
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment