Principle:Microsoft Semantic kernel Single Agent Invocation
Overview
Single Agent Invocation is the principle of sending a message to an agent and receiving streamed responses with automatic thread management. In Microsoft Semantic Kernel, the InvokeAsync method is the primary entry point for interacting with an agent, yielding an asynchronous stream of response items that include both the message content and the updated conversation thread.
This principle belongs to Workflow 3: Agent Conversation and Orchestration and represents the core interaction pattern for all agent-based applications.
Description
The Invocation Model
Single agent invocation follows a straightforward request-response pattern, enhanced with two important characteristics:
- Streaming -- Responses are delivered as an
IAsyncEnumerable<AgentResponseItem<ChatMessageContent>>, allowing the caller to process response tokens incrementally as they arrive rather than waiting for the complete response. - Thread Management -- Each response item carries a
Threadproperty that represents the updated conversation state. The caller captures this thread and passes it into subsequent invocations to maintain conversational continuity.
Invocation Lifecycle
When InvokeAsync is called, the following sequence occurs:
- Message Construction -- The caller creates a
ChatMessageContentwith anAuthorRole(typicallyUser) and the message text. - Thread Resolution -- If a thread is provided, the agent uses its history as conversation context. If no thread is provided (or
null), a new thread is created automatically. - History Assembly -- The agent constructs a complete chat history by:
- Prepending the agent's
Instructionsas a system message. - Appending all messages from the thread (if any).
- Appending the new user message.
- Prepending the agent's
- Service Invocation -- The assembled history is sent to the underlying
IChatCompletionService. - Tool Handling -- If the model requests tool calls and plugins are available, Semantic Kernel automatically invokes the corresponding
KernelFunctioninstances and feeds results back to the model. This loop continues until the model produces a final text response. - Response Streaming -- The agent yields
AgentResponseItem<ChatMessageContent>objects containing the response content and the updated thread. - Thread Update -- The user's message and the agent's response are both recorded in the thread for future invocations.
Stateless vs. Stateful Invocation
The invocation pattern supports two modes, controlled entirely by whether a thread is provided:
Stateless (one-shot):
- No thread is passed.
- The agent sees only its instructions and the current message.
- A new thread is created but may be discarded if not captured.
- Suitable for independent, context-free queries.
Stateful (multi-turn):
- A thread is passed from a previous invocation.
- The agent sees the full conversation history.
- The updated thread is captured from the response for the next call.
- Suitable for ongoing conversations requiring context.
The AgentResponseItem Envelope
Each item in the response stream is an AgentResponseItem<ChatMessageContent> that bundles:
- Message -- The
ChatMessageContentcontaining the response text, author role, and metadata. - Thread -- The
AgentThreadrepresenting the conversation state after this response. This is the value the caller should capture for subsequent invocations.
This envelope design ensures that thread state is always available alongside the content, preventing the common bug of losing conversation context.
Usage
Single Agent Invocation is used whenever:
- A user message needs to be sent to an agent and a response received.
- A multi-turn conversation loop is implemented.
- An agent needs to process a batch of independent messages.
- Tool-equipped agents need to handle function calling transparently.
Stateless One-Shot Invocation
ChatMessageContent message = new(AuthorRole.User, "Fortune favors the bold.");
await foreach (AgentResponseItem<ChatMessageContent> response in agent.InvokeAsync(message))
{
Console.WriteLine(response.Message.Content);
}
Stateful Multi-Turn Invocation
AgentThread? thread = null;
// First turn
ChatMessageContent msg1 = new(AuthorRole.User, "What is the speed of light?");
await foreach (var response in agent.InvokeAsync(msg1, thread))
{
thread = response.Thread;
Console.WriteLine(response.Message.Content);
}
// Second turn — agent remembers the first exchange
ChatMessageContent msg2 = new(AuthorRole.User, "Express that in miles per second.");
await foreach (var response in agent.InvokeAsync(msg2, thread))
{
thread = response.Thread;
Console.WriteLine(response.Message.Content);
}
Theoretical Basis
Single Agent Invocation is grounded in the Request-Reply messaging pattern, one of the foundational patterns in distributed systems. The caller sends a request (the user message) and receives a reply (the agent response), with the conversation thread serving as a correlation token that links related request-reply pairs into a coherent session.
The streaming delivery model reflects the Iterator Pattern, specifically C#'s IAsyncEnumerable implementation, which provides lazy, pull-based enumeration of results. This is particularly important for AI responses, which may take several seconds to generate in full -- streaming allows the UI to display partial results immediately.
The automatic thread management embodies the Session Pattern, where conversational state is maintained externally and passed with each interaction, similar to HTTP session management in web applications.
Related Pages
- Agent InvokeAsync (Implementation) -- The concrete API reference for agent invocation.
- Conversation Thread Management (Principle) -- The thread model used by invocations.
- ChatHistoryAgentThread (Implementation) -- The thread implementation used in invocations.
- Chat Completion Agent Creation (Principle) -- Creating agents to be invoked.
- Agent Plugin Equipment (Principle) -- How plugins are used during invocation via tool calling.
- Multi-Agent Orchestration (Principle) -- Coordinating multiple invocations across agents.