Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Anthropics Anthropic sdk python Final Message Retrieval

From Leeroopedia
Knowledge Sources
Domains Streaming, LLM
Last Updated 2026-02-15 00:00 GMT

Overview

The Final Message Retrieval principle describes the pattern of draining a stream iterator to completion and returning the accumulated result as a single, complete object. While streaming is designed for incremental processing, many use cases ultimately need the final, complete message -- for example, to extract the full response text, inspect usage statistics, or pass the result to downstream processing. The SDK provides convenience methods that consume the entire stream and return the finalized message snapshot, bridging the gap between streaming transport and batch-style consumption.

Core Concepts

Stream Draining

Stream draining is the act of consuming all remaining events from an iterator without processing them individually. In the Anthropic SDK, this is performed by the until_done() method, which iterates through every event in the stream, allowing the accumulator to build the complete message snapshot, and discards the individual events.

[Stream with remaining events]
    |
    v
until_done() --> consume all events (accumulation side-effects update snapshot)
    |
    v
[Stream exhausted, __final_message_snapshot is complete]
    |
    v
Return __final_message_snapshot

The key insight is that stream draining is not wasteful -- the accumulation logic runs as a side effect of iteration, so by the time the stream is drained, the __final_message_snapshot contains the complete ParsedMessage with all content blocks, usage data, stop reason, and (if applicable) parsed structured output.

Finalization Pattern

The finalization pattern has three components:

  1. Drain: Call until_done() to consume remaining events if the stream has not yet been fully iterated.
  2. Assert: Verify that the snapshot was populated (i.e., at least a message_start event was received).
  3. Return: Return the accumulated snapshot as a complete ParsedMessage object.

This is idempotent -- if the stream has already been fully iterated (e.g., the caller used a for loop over events first), until_done() is effectively a no-op and the snapshot is already complete.

Convenience Methods for Common Access Patterns

While get_final_message() returns the complete message object, the most common use case is simply extracting the text content. The get_final_text() method layers on top of get_final_message():

  1. Drains the stream (via get_final_message()).
  2. Iterates over all content blocks in the final message.
  3. Filters for blocks of type "text".
  4. Concatenates their .text fields.
  5. Returns the combined string (or raises an error if no text blocks exist).

This layered approach avoids duplicating the drain logic and provides a clear error when the response contains only non-text content (e.g., only tool use blocks).

Partial vs. Complete Consumption

The finalization methods support both partial and complete prior consumption:

  • No prior iteration: The caller calls get_final_message() immediately. until_done() drains the entire stream.
  • Partial iteration: The caller iterates some events (e.g., to display progress), then calls get_final_message(). until_done() drains only the remaining events.
  • Complete iteration: The caller fully iterates the stream with a for loop, then calls get_final_message(). until_done() finds the iterator exhausted and returns immediately.

In all three cases, get_final_message() returns the same complete ParsedMessage, because the accumulation has captured every event.

Why Not Just Use Non-Streaming?

One might ask: if you want the complete message, why use streaming at all? There are several reasons:

  • Timeout avoidance: Long responses may exceed HTTP timeout limits. Streaming keeps the connection alive with continuous data.
  • Progressive display: The caller may want to show partial output in a UI while also needing the final complete message for further processing.
  • Usage monitoring: Streaming provides early visibility into token usage via message_delta events, even before the response completes.
  • Unified API: A single streaming code path can serve both real-time display and batch retrieval, reducing code duplication.

Design Rationale

The get_final_message() / get_final_text() methods follow the principle of progressive enhancement: the base streaming infrastructure handles incremental events, and convenience methods layer on top for common aggregate access patterns. By building finalization on top of the existing iterator and accumulator (rather than as a separate code path), the SDK maintains a single source of truth for message construction.

The explicit RuntimeError in get_final_text() when no text blocks are found is a deliberate design choice -- it catches the common mistake of calling get_final_text() on a tool-use-only response, providing a clear error message that directs the caller to use get_final_message().content instead.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment