Principle:Anthropics Anthropic sdk python Multi turn Conversation Management
| Knowledge Sources | |
|---|---|
| Domains | API_Client, LLM |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
The Multi-turn Conversation Management principle describes how the Anthropic Messages API implements stateless multi-turn conversations through message list accumulation. The API itself maintains no server-side session state; instead, the entire conversation history is sent with every request, and the caller is responsible for building and maintaining the message list.
Theoretical Basis
Stateless Conversation Through Message Accumulation
Unlike session-based chat APIs that maintain conversation state on the server, the Anthropic Messages API follows a stateless request/response model. Each call to Messages.create() is independent -- the API has no memory of previous calls. Multi-turn conversation is achieved by the caller accumulating messages into a list and re-sending the full history with each request.
This design has several implications:
- Full control -- The caller can edit, insert, or remove any message in the history before sending the next request. This enables prompt engineering techniques like pruning irrelevant turns or injecting synthetic context.
- No session management -- There are no session IDs, no TTLs, and no server-side state to manage or invalidate.
- Reproducibility -- The same message list always produces the same conversation context (modulo model non-determinism), making conversations fully serializable and replayable.
- Token cost awareness -- The full conversation history is tokenized on every request, so token costs grow linearly with conversation length.
Converting Responses Back to Params for Multi-turn
The SDK's type system is designed to make response-to-parameter conversion straightforward. The Message response model has a content field of type List[ContentBlock], and the MessageParam TypedDict accepts content as either a str or an Iterable[ContentBlockParam]. Critically, ContentBlock (the response type) is included in the union accepted by MessageParam.content, which means response content blocks can be passed directly back as input without manual conversion:
# The response's content blocks are directly compatible with MessageParam
messages.append({"role": "assistant", "content": response.content})
This design eliminates a common source of bugs in multi-turn implementations where developers would need to manually map response objects to request objects.
Role Alternation Constraint
The Messages API enforces a structural constraint on the message list: messages must alternate between "user" and "assistant" roles. The API provides some flexibility:
- Consecutive same-role messages are merged -- If the caller sends two consecutive
"user"messages, the API automatically combines them into a single turn. This simplifies programmatic construction where strict alternation would require manual merging. - First message must be user -- The conversation must begin with a
"user"role message. - Assistant prefilling -- If the last message has role
"assistant", the model continues from that content rather than generating a new turn. This is useful for constraining the output format (e.g., forcing the model to start with a specific prefix).
System Prompt Separation
There is no "system" role in the messages list. System prompts are provided through a separate top-level system parameter. This separation:
- Keeps the alternating user/assistant structure clean
- Allows the system prompt to be changed between turns without modifying the message list
- Enables structured system prompts (arrays of
TextBlockParam) with cache control
Conversation Truncation and Window Management
Since the full message list is sent with every request and token costs grow linearly, practical applications need strategies for managing conversation length:
- Sliding window -- Keep only the most recent N turns
- Summarization -- Replace older turns with a condensed summary
- Token counting -- Use
Messages.count_tokens()to measure the conversation size before sending
The SDK does not impose any of these strategies -- they are left to the application developer -- but the count_tokens endpoint provides the token-counting primitive needed to implement them.
Design Constraints
- Maximum of 100,000 messages per request.
- The
MessageParamTypedDict has exactly two required fields:role(Literal["user", "assistant"]) andcontent. - Content can be a simple
strfor text-only messages or anIterableof typed content blocks for multimodal messages (images, documents, tool results). - Token usage reported in
Usageincludes all messages in the request, not just the new user message.