Implementation:Googleapis Python genai Models Generate Content Cached
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Generative_AI |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Concrete tool for generating content using pre-cached context for reduced cost and latency provided by the google-genai models module.
Description
Models.generate_content with config.cached_content set to a cache resource name generates responses using the pre-cached context. The model must match the one used to create the cache. Only the new query content is transmitted, while the cached context is referenced server-side. The response includes usage_metadata showing the split between cached and new tokens. This is the same generate_content method used for standard generation, with the cached_content field in GenerateContentConfig providing the cache reference.
Usage
Set config.cached_content to the CachedContent.name from a previous caches.create call. Pass only the new query in contents. The model parameter must match the cache's model.
Code Reference
Source Location
- Repository: googleapis/python-genai
- File: google/genai/models.py
- Lines: L5507-5666
- File: google/genai/types.py
- Lines: L5077 (GenerateContentConfig.cached_content field)
Signature
class Models:
def generate_content(
self,
*,
model: str,
contents: types.ContentListUnionDict,
config: Optional[types.GenerateContentConfigOrDict] = None,
) -> types.GenerateContentResponse:
"""Generates content, optionally using cached context.
When config.cached_content is set, the model uses the
pre-cached context along with the new contents.
"""
Import
from google import genai
from google.genai import types
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | str | Yes | Must match the model used for cache creation |
| contents | ContentListUnionDict | Yes | New query content (not the cached context) |
| config.cached_content | str | Yes | Cache resource name from CachedContent.name |
Outputs
| Name | Type | Description |
|---|---|---|
| GenerateContentResponse | GenerateContentResponse | Response with .text and .usage_metadata showing cached/new token split |
Usage Examples
Query Against Cached Document
from google import genai
from google.genai import types
client = genai.Client(api_key="YOUR_API_KEY")
# Assume cache was created earlier
cache_name = "cachedContents/abc123"
# Query 1
response1 = client.models.generate_content(
model="gemini-1.5-flash-002",
contents="What are the main features described in the document?",
config=types.GenerateContentConfig(
cached_content=cache_name,
),
)
print(response1.text)
# Query 2 (same cache, different question)
response2 = client.models.generate_content(
model="gemini-1.5-flash-002",
contents="Summarize the troubleshooting section.",
config=types.GenerateContentConfig(
cached_content=cache_name,
),
)
print(response2.text)
Check Token Usage
response = client.models.generate_content(
model="gemini-1.5-flash-002",
contents="Find all mentions of error handling.",
config=types.GenerateContentConfig(cached_content=cache_name),
)
print(f"Cached tokens: {response.usage_metadata.cached_content_token_count}")
print(f"Prompt tokens: {response.usage_metadata.prompt_token_count}")
print(f"Response tokens: {response.usage_metadata.candidates_token_count}")