Principle:Anthropics Anthropic sdk python Token Counting And Batching

Knowledge Sources	Anthropic Python SDK Anthropic API Docs
Domains	SDK Architecture, Token Management, Batch Processing
Last Updated	2026-02-15 00:00 GMT

Overview

The Token Counting And Batching principle describes two complementary SDK capabilities: pre-flight token estimation and asynchronous batch message processing. Token counting allows callers to estimate request costs before submitting them to the API. MessageCountTokensParams defines the request shape for count_tokens endpoints in both stable and beta variants. MessageBatch models track batch processing state with result retrieval. WebSearchTool parameters configure real-time search integration for tool-augmented workflows.

Theoretical Basis

Pre-Flight Token Estimation

The count_tokens endpoint accepts the same parameters as a message creation request but returns only a token count without generating a response. This serves as a pre-flight check that enables:

Cost estimation -- Callers can calculate the expected cost of a request before committing to it, which is critical for budget-constrained applications.
Context window validation -- By knowing the input token count in advance, callers can verify that the combined input and expected output will fit within the model's context window.
Request optimization -- If the token count exceeds expectations, the caller can trim the input (e.g., summarize conversation history) before sending the actual request.

MessageCountTokensParams

The MessageCountTokensParams TypedDict mirrors MessageCreateParams but omits output-related fields (e.g., stream, stop_sequences) that are irrelevant for counting:

class MessageCountTokensParams(TypedDict, total=False):
    messages: Required[Iterable[MessageParam]]
    model: Required[str]
    system: Union[str, Iterable[TextBlockParam]]
    tools: Iterable[ToolParam]
    tool_choice: ToolChoiceParam
    # ... additional input-shaping parameters

A corresponding Beta_MessageCountTokensParams extends this with beta-specific fields such as additional tool types and experimental parameters. Both stable and beta variants follow the same structural pattern.

Batch Message Processing

The MessageBatch model represents an asynchronous batch of message creation requests. Batch processing is designed for high-throughput, latency-tolerant workloads:

Submission -- The caller submits a collection of message requests as a single batch. Each request in the batch is independent and may have different parameters.
State tracking -- The MessageBatch model exposes a processing_status field that tracks the batch through its lifecycle: in_progress, ended, etc.
Result retrieval -- Once processing completes, results are available through a dedicated retrieval endpoint. Each result in the batch includes either a successful Message response or an error.

MessageBatch lifecycle:
  created → in_progress → ended
                         ├── results available (success/error per request)
                         └── expired (if not retrieved within TTL)

The batch model also tracks aggregate statistics:

request_counts -- Counts of succeeded, errored, expired, and canceled requests within the batch.
created_at / ended_at -- Timestamps for lifecycle tracking.
results_url -- The URL for downloading batch results in JSONL format.

Cost Efficiency of Batching

Batch processing trades latency for throughput efficiency. The API can optimize resource allocation across batch requests, potentially offering lower per-request costs compared to individual API calls. This makes batching particularly suitable for:

Evaluation pipelines -- Running a model against a test dataset where individual response latency is not critical.
Data processing -- Classifying, summarizing, or extracting information from large document collections.
Offline analysis -- Generating reports or analyses that do not require real-time responses.

WebSearchTool Integration

The WebSearchTool parameter type (versioned as WebSearchTool_20250305) configures real-time web search as a tool available to the model during message generation:

class WebSearchTool_20250305(TypedDict, total=False):
    type: Required[Literal["web_search_20250305"]]
    name: str
    max_uses: int
    allowed_domains: List[str]
    blocked_domains: List[str]

This parameter type integrates with both the count_tokens endpoint (to account for search tool overhead in token estimates) and the create endpoint (to enable search-augmented generation). The domain allowlist/blocklist fields provide fine-grained control over which web sources the model can access.

Design Constraints

Token counting is an approximation; the actual token count at generation time may differ slightly due to internal API processing.
MessageCountTokensParams requires the same model field as MessageCreateParams because token counts are model-specific (different models use different tokenizers).
Batch results have a time-to-live (TTL). Results not retrieved within the TTL are expired and become unavailable.
Each request in a batch is processed independently; a failure in one request does not affect others in the same batch.
The WebSearchTool type is date-versioned (e.g., 20250305) to allow breaking changes to search tool parameters without affecting existing integrations.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment