Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Microsoft Autogen Tool Execution Loop

From Leeroopedia
Knowledge Sources
Domains Tool Use, Agent Execution, LLM Agents, Iterative Reasoning
Last Updated 2026-02-11 00:00 GMT

Overview

The tool execution loop is the iterative process within an LLM agent where the model generates tool call requests, the framework executes those tools, the results are fed back to the model, and the cycle repeats until the model produces a final text response or an iteration limit is reached.

Description

When an LLM agent is equipped with tools, its response generation follows a fundamentally different pattern than a simple prompt-to-text flow. Instead of a single inference call, the agent enters a loop that interleaves model inference with tool execution.

The loop proceeds as follows:

  1. Initial inference: The agent sends the conversation history (including system messages, user messages, and tool schemas) to the LLM. The LLM either responds with text (task is complete) or with one or more tool call requests.
  2. Tool call dispatch: If the LLM requests tool calls, the agent extracts each function call (name + arguments), dispatches them to the appropriate tool implementations via the workbench, and collects the results.
  3. Result injection: The tool execution results are added to the conversation context as function result messages, giving the LLM visibility into what the tools returned.
  4. Next inference: The agent calls the LLM again with the updated context. The LLM can now reason about the tool results and either produce a final text response or request additional tool calls.
  5. Termination: The loop terminates when the LLM produces a text response (no tool calls) or the maximum iteration count is reached.

Several important behaviors govern the loop:

  • Parallel tool execution: When the LLM requests multiple tool calls in a single response, all calls are executed concurrently using async gather. This improves latency when tools are I/O-bound.
  • Streaming support: Tool executions can emit streaming events (for sub-agent tools that produce incremental output). The loop forwards these events to the caller as they arrive.
  • Handoff detection: After tool execution, the loop checks whether any executed tool represents a handoff to another agent. If so, the loop terminates with a handoff response instead of continuing.
  • Post-loop processing: After the loop ends (either by text response or iteration exhaustion), the agent either reflects on the tool results (sending them back to the LLM for summarization) or formats a summary using a template string.

Usage

The tool execution loop is relevant when:

  • An agent needs to perform multi-step reasoning that involves gathering information from tools before producing a final answer.
  • You need to control how many rounds of tool use an agent can perform (via max_tool_iterations).
  • You want to understand or debug the sequence of LLM calls and tool executions that an agent performs.
  • You are building agents that use tools iteratively (e.g., search, refine query, search again).
  • You need to process the streaming output of tool-augmented agents for real-time UIs.

Theoretical Basis

The tool execution loop implements a bounded ReAct loop (Reason-Act-Observe):

FUNCTION tool_execution_loop(messages, tools, model, max_iterations):
    context = build_initial_context(messages)
    inner_events = []

    FOR iteration IN range(max_iterations):
        # REASON: Ask the LLM what to do
        model_result = call_llm(context, tool_schemas=tools.list_schemas())

        # Check if the LLM produced a text response (done reasoning)
        IF model_result.content is text:
            RETURN Response(text, inner_events)

        # ACT: Execute the tool calls
        tool_calls = model_result.content  # List of FunctionCall
        EMIT ToolCallRequestEvent(tool_calls)
        inner_events.append(ToolCallRequestEvent)

        # Execute all tool calls concurrently
        results = PARALLEL_EXECUTE(tools.call(call) FOR call IN tool_calls)
        EMIT ToolCallExecutionEvent(results)
        inner_events.append(ToolCallExecutionEvent)

        # Check for handoff
        IF any result is a handoff:
            RETURN HandoffResponse(target, context)

        # OBSERVE: Add results to context for next iteration
        context.add(FunctionExecutionResultMessage(results))

        # If this is the last iteration, break to summary/reflection
        IF iteration == max_iterations - 1:
            BREAK

    # Post-loop: either reflect or summarize
    IF reflect_on_tool_use:
        reflection = call_llm(context, no_tools=True)
        RETURN Response(reflection, inner_events)
    ELSE:
        summary = format_tool_results(results, summary_format)
        RETURN Response(summary, inner_events)

The bounded nature of the loop is essential for preventing runaway tool-calling behavior. Without iteration limits, a model could enter an infinite loop of tool calls. The max_iterations parameter provides a hard ceiling.

The reflection step after the loop implements a form of self-evaluation. By asking the LLM to review tool outputs without tool schemas, the model is forced to synthesize the information into a coherent response rather than attempting more tool calls.

The distinction between reflection and summary formatting offers a trade-off between quality and cost. Reflection produces higher-quality responses (the LLM interprets the results) but requires an additional inference call. Summary formatting is cheaper (no LLM call) but produces mechanical output.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment