Principle:Anthropics Anthropic sdk python Multi turn Conversation Management

Knowledge Sources	Anthropic Python SDK Anthropic API Docs
Domains	API_Client, LLM
Last Updated	2026-02-15 00:00 GMT

Overview

The Multi-turn Conversation Management principle describes how the Anthropic Messages API implements stateless multi-turn conversations through message list accumulation. The API itself maintains no server-side session state; instead, the entire conversation history is sent with every request, and the caller is responsible for building and maintaining the message list.

Theoretical Basis

Stateless Conversation Through Message Accumulation

Unlike session-based chat APIs that maintain conversation state on the server, the Anthropic Messages API follows a stateless request/response model. Each call to Messages.create() is independent -- the API has no memory of previous calls. Multi-turn conversation is achieved by the caller accumulating messages into a list and re-sending the full history with each request.

This design has several implications:

Full control -- The caller can edit, insert, or remove any message in the history before sending the next request. This enables prompt engineering techniques like pruning irrelevant turns or injecting synthetic context.
No session management -- There are no session IDs, no TTLs, and no server-side state to manage or invalidate.
Reproducibility -- The same message list always produces the same conversation context (modulo model non-determinism), making conversations fully serializable and replayable.
Token cost awareness -- The full conversation history is tokenized on every request, so token costs grow linearly with conversation length.

Converting Responses Back to Params for Multi-turn

The SDK's type system is designed to make response-to-parameter conversion straightforward. The Message response model has a content field of type List[ContentBlock], and the MessageParam TypedDict accepts content as either a str or an Iterable[ContentBlockParam]. Critically, ContentBlock (the response type) is included in the union accepted by MessageParam.content, which means response content blocks can be passed directly back as input without manual conversion:

# The response's content blocks are directly compatible with MessageParam
messages.append({"role": "assistant", "content": response.content})

This design eliminates a common source of bugs in multi-turn implementations where developers would need to manually map response objects to request objects.

Role Alternation Constraint

The Messages API enforces a structural constraint on the message list: messages must alternate between "user" and "assistant" roles. The API provides some flexibility:

Consecutive same-role messages are merged -- If the caller sends two consecutive "user" messages, the API automatically combines them into a single turn. This simplifies programmatic construction where strict alternation would require manual merging.
First message must be user -- The conversation must begin with a "user" role message.
Assistant prefilling -- If the last message has role "assistant", the model continues from that content rather than generating a new turn. This is useful for constraining the output format (e.g., forcing the model to start with a specific prefix).

System Prompt Separation

There is no "system" role in the messages list. System prompts are provided through a separate top-level system parameter. This separation:

Keeps the alternating user/assistant structure clean
Allows the system prompt to be changed between turns without modifying the message list
Enables structured system prompts (arrays of TextBlockParam) with cache control

Conversation Truncation and Window Management

Since the full message list is sent with every request and token costs grow linearly, practical applications need strategies for managing conversation length:

Sliding window -- Keep only the most recent N turns
Summarization -- Replace older turns with a condensed summary
Token counting -- Use Messages.count_tokens() to measure the conversation size before sending

The SDK does not impose any of these strategies -- they are left to the application developer -- but the count_tokens endpoint provides the token-counting primitive needed to implement them.

Design Constraints

Maximum of 100,000 messages per request.
The MessageParam TypedDict has exactly two required fields: role (Literal["user", "assistant"]) and content.
Content can be a simple str for text-only messages or an Iterable of typed content blocks for multimodal messages (images, documents, tool results).
Token usage reported in Usage includes all messages in the request, not just the new user message.

Related Pages

Implemented By

Implementation:Anthropics_Anthropic_sdk_python_MessageParam_Accumulation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment