Principle:Microsoft Autogen Tool Execution Loop
| Knowledge Sources | |
|---|---|
| Domains | Tool Use, Agent Execution, LLM Agents, Iterative Reasoning |
| Last Updated | 2026-02-11 00:00 GMT |
Overview
The tool execution loop is the iterative process within an LLM agent where the model generates tool call requests, the framework executes those tools, the results are fed back to the model, and the cycle repeats until the model produces a final text response or an iteration limit is reached.
Description
When an LLM agent is equipped with tools, its response generation follows a fundamentally different pattern than a simple prompt-to-text flow. Instead of a single inference call, the agent enters a loop that interleaves model inference with tool execution.
The loop proceeds as follows:
- Initial inference: The agent sends the conversation history (including system messages, user messages, and tool schemas) to the LLM. The LLM either responds with text (task is complete) or with one or more tool call requests.
- Tool call dispatch: If the LLM requests tool calls, the agent extracts each function call (name + arguments), dispatches them to the appropriate tool implementations via the workbench, and collects the results.
- Result injection: The tool execution results are added to the conversation context as function result messages, giving the LLM visibility into what the tools returned.
- Next inference: The agent calls the LLM again with the updated context. The LLM can now reason about the tool results and either produce a final text response or request additional tool calls.
- Termination: The loop terminates when the LLM produces a text response (no tool calls) or the maximum iteration count is reached.
Several important behaviors govern the loop:
- Parallel tool execution: When the LLM requests multiple tool calls in a single response, all calls are executed concurrently using async gather. This improves latency when tools are I/O-bound.
- Streaming support: Tool executions can emit streaming events (for sub-agent tools that produce incremental output). The loop forwards these events to the caller as they arrive.
- Handoff detection: After tool execution, the loop checks whether any executed tool represents a handoff to another agent. If so, the loop terminates with a handoff response instead of continuing.
- Post-loop processing: After the loop ends (either by text response or iteration exhaustion), the agent either reflects on the tool results (sending them back to the LLM for summarization) or formats a summary using a template string.
Usage
The tool execution loop is relevant when:
- An agent needs to perform multi-step reasoning that involves gathering information from tools before producing a final answer.
- You need to control how many rounds of tool use an agent can perform (via
max_tool_iterations). - You want to understand or debug the sequence of LLM calls and tool executions that an agent performs.
- You are building agents that use tools iteratively (e.g., search, refine query, search again).
- You need to process the streaming output of tool-augmented agents for real-time UIs.
Theoretical Basis
The tool execution loop implements a bounded ReAct loop (Reason-Act-Observe):
FUNCTION tool_execution_loop(messages, tools, model, max_iterations):
context = build_initial_context(messages)
inner_events = []
FOR iteration IN range(max_iterations):
# REASON: Ask the LLM what to do
model_result = call_llm(context, tool_schemas=tools.list_schemas())
# Check if the LLM produced a text response (done reasoning)
IF model_result.content is text:
RETURN Response(text, inner_events)
# ACT: Execute the tool calls
tool_calls = model_result.content # List of FunctionCall
EMIT ToolCallRequestEvent(tool_calls)
inner_events.append(ToolCallRequestEvent)
# Execute all tool calls concurrently
results = PARALLEL_EXECUTE(tools.call(call) FOR call IN tool_calls)
EMIT ToolCallExecutionEvent(results)
inner_events.append(ToolCallExecutionEvent)
# Check for handoff
IF any result is a handoff:
RETURN HandoffResponse(target, context)
# OBSERVE: Add results to context for next iteration
context.add(FunctionExecutionResultMessage(results))
# If this is the last iteration, break to summary/reflection
IF iteration == max_iterations - 1:
BREAK
# Post-loop: either reflect or summarize
IF reflect_on_tool_use:
reflection = call_llm(context, no_tools=True)
RETURN Response(reflection, inner_events)
ELSE:
summary = format_tool_results(results, summary_format)
RETURN Response(summary, inner_events)
The bounded nature of the loop is essential for preventing runaway tool-calling behavior. Without iteration limits, a model could enter an infinite loop of tool calls. The max_iterations parameter provides a hard ceiling.
The reflection step after the loop implements a form of self-evaluation. By asking the LLM to review tool outputs without tool schemas, the model is forced to synthesize the information into a coherent response rather than attempting more tool calls.
The distinction between reflection and summary formatting offers a trade-off between quality and cost. Reflection produces higher-quality responses (the LLM interprets the results) but requires an additional inference call. Summary formatting is cheaper (no LLM call) but produces mechanical output.