Implementation:Googleapis Python genai Models Generate Content Cached

Knowledge Sources	googleapis/python-genai Google Gen AI Python SDK
Domains	Optimization, Generative_AI
Last Updated	2026-02-15 00:00 GMT

Overview

Concrete tool for generating content using pre-cached context for reduced cost and latency provided by the google-genai models module.

Description

Models.generate_content with config.cached_content set to a cache resource name generates responses using the pre-cached context. The model must match the one used to create the cache. Only the new query content is transmitted, while the cached context is referenced server-side. The response includes usage_metadata showing the split between cached and new tokens. This is the same generate_content method used for standard generation, with the cached_content field in GenerateContentConfig providing the cache reference.

Usage

Set config.cached_content to the CachedContent.name from a previous caches.create call. Pass only the new query in contents. The model parameter must match the cache's model.

Code Reference

Source Location

Repository: googleapis/python-genai
File: google/genai/models.py
Lines: L5507-5666
File: google/genai/types.py
Lines: L5077 (GenerateContentConfig.cached_content field)

Signature

class Models:
    def generate_content(
        self,
        *,
        model: str,
        contents: types.ContentListUnionDict,
        config: Optional[types.GenerateContentConfigOrDict] = None,
    ) -> types.GenerateContentResponse:
        """Generates content, optionally using cached context.

        When config.cached_content is set, the model uses the
        pre-cached context along with the new contents.
        """

Import

from google import genai
from google.genai import types

I/O Contract

Inputs

Name	Type	Required	Description
model	str	Yes	Must match the model used for cache creation
contents	ContentListUnionDict	Yes	New query content (not the cached context)
config.cached_content	str	Yes	Cache resource name from CachedContent.name

Outputs

Name	Type	Description
GenerateContentResponse	GenerateContentResponse	Response with .text and .usage_metadata showing cached/new token split

Usage Examples

Query Against Cached Document

from google import genai
from google.genai import types

client = genai.Client(api_key="YOUR_API_KEY")

# Assume cache was created earlier
cache_name = "cachedContents/abc123"

# Query 1
response1 = client.models.generate_content(
    model="gemini-1.5-flash-002",
    contents="What are the main features described in the document?",
    config=types.GenerateContentConfig(
        cached_content=cache_name,
    ),
)
print(response1.text)

# Query 2 (same cache, different question)
response2 = client.models.generate_content(
    model="gemini-1.5-flash-002",
    contents="Summarize the troubleshooting section.",
    config=types.GenerateContentConfig(
        cached_content=cache_name,
    ),
)
print(response2.text)

Check Token Usage

response = client.models.generate_content(
    model="gemini-1.5-flash-002",
    contents="Find all mentions of error handling.",
    config=types.GenerateContentConfig(cached_content=cache_name),
)
print(f"Cached tokens: {response.usage_metadata.cached_content_token_count}")
print(f"Prompt tokens: {response.usage_metadata.prompt_token_count}")
print(f"Response tokens: {response.usage_metadata.candidates_token_count}")

Related Pages

Implements Principle

Principle:Googleapis_Python_genai_Content_Generation_With_Cache

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment