Principle:Microsoft Agent framework Agent Execution
| Property | Value |
|---|---|
| Principle Name | Agent Execution |
| SDK | Microsoft Agent Framework |
| Repository | Microsoft Agent Framework |
| Source Reference | python/packages/core/agent_framework/_agents.py:L827-868
|
| Import | from agent_framework import Agent
|
| Domains | Agent_Architecture, NLP |
Overview
Agent Execution is a pattern for sending messages to an AI agent and receiving responses, supporting both synchronous and streaming execution modes.
Description
Agent Execution is the core interaction pattern where a user query is sent to an agent, which processes it through the LLM, automatically invokes any required tools, and returns a response. The pattern supports multi-turn conversations via threads, allowing the agent to maintain conversational context across successive invocations.
The execution model provides two primary modes:
- Non-streaming (default): The caller awaits a single
AgentResponseobject containing the complete response text, message history, and any structured output value. - Streaming: When
stream=Trueis passed, the method returns aResponseStreamthat yields incrementalAgentResponseUpdateobjects as the LLM generates tokens, enabling real-time display of partial results.
Multi-Turn Conversations
The AgentThread object serves as a conversation container. When passed to successive run() calls, it preserves the full message history, enabling the agent to reference prior exchanges. Each call appends the new user message and the agent's response to the thread.
Tool Invocation
During execution, if the LLM determines that a tool call is needed to satisfy the user's request, the agent automatically invokes the appropriate tool, incorporates the tool's output into the conversation, and continues generating the response. This tool invocation loop repeats until the LLM produces a final text response without further tool calls.
Per-Call Overrides
The run() method accepts optional tools and options parameters that override the agent's default configuration for a single invocation. This allows callers to dynamically adjust tool availability or model options without modifying the agent instance.
Theoretical Basis
The run() method implements a request-response cycle with an automatic tool invocation loop. The agent sends the user's message (along with conversation history and tool schemas) to the LLM, processes the response, and if the response includes tool calls, executes them and re-invokes the LLM with the updated context. This cycle continues until the LLM produces a final answer.
Setting stream=True enables real-time token streaming, where partial response tokens are yielded as they are generated by the LLM, rather than waiting for the complete response. This is implemented via an asynchronous iterator pattern that yields AgentResponseUpdate objects.
The dual return type (AgentResponse for non-streaming, ResponseStream for streaming) follows the Strategy Pattern, where the execution strategy is selected at call time via the stream parameter.
Related Pages
Sources
| Type | Name | URL |
|---|---|---|
| Repo | Microsoft Agent Framework | https://github.com/microsoft/agent-framework |