Principle:Googleapis Python genai Cache Creation

Knowledge Sources	Google Gen AI Python SDK googleapis/python-genai
Domains	Optimization, Cost_Reduction
Last Updated	2026-02-15 00:00 GMT

Overview

A mechanism for pre-computing and storing model context to reduce latency and cost for repeated queries over the same large content.

Description

Cache Creation stores processed content (documents, system instructions, few-shot examples) on the server side so subsequent generation requests can reference it without re-transmitting or re-processing the content. This is particularly valuable for applications that repeatedly query the same large context (e.g., a customer support bot querying a product manual, or an analyst querying a long report). Caches have a time-to-live (TTL) and are associated with a specific model. They reduce both input token costs and latency for repeated queries.

Usage

Use context caching when you have large, stable content (documents, system prompts, few-shot examples) that multiple generation requests will reference. Upload the content first (via files.upload), then create a cache with the content and a TTL. Reference the cache in subsequent generation calls. The cache saves costs when the cached content is large relative to the per-query content and the number of queries is sufficient to amortize the cache creation cost.

Theoretical Basis

Context caching follows a Memoization Pattern at the model context level:

# Without caching: each query re-processes the full context
for query in queries:
    response = model.generate([large_document, query])  # O(D + Q) tokens each time

# With caching: context is processed once
cache = create_cache(large_document)  # One-time cost: O(D) tokens
for query in queries:
    response = model.generate(query, cache=cache)  # O(Q) tokens each time

Cost savings increase linearly with the number of queries over the same cached context.

Related Pages

Implemented By

Implementation:Googleapis_Python_genai_Caches_Create

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment