| Property |
Value
|
| sources |
litellm/google_genai/main.py
|
| domains |
Google GenAI, Gemini, Content Generation, Streaming
|
| last_updated |
2026-02-15 16:00 GMT
|
Overview
The Google GenAI module provides a native SDK-style interface for Google's Generative AI (Gemini) content generation, supporting both streaming and non-streaming responses with an automatic fallback to LiteLLM's completion format when provider-native support is unavailable.
Description
This module implements the Google GenAI generateContent API through four function pairs: generate_content/agenerate_content for non-streaming and generate_content_stream/agenerate_content_stream for streaming. The implementation uses a helper class GenerateContentHelper with a shared setup_generate_content_call() method that resolves the provider, loads the BaseGoogleGenAIGenerateContentConfig, maps optional parameters, and constructs the request body. A GenerateContentSetupResult Pydantic model carries the setup state between the helper and the calling functions. When no provider config is found (e.g., for non-Gemini providers), the module automatically falls back to GenerateContentToCompletionHandler, which adapts the Google GenAI format to LiteLLM's standard completion format. Mock responses and backward compatibility with the generationConfig parameter are also supported.
Usage
Import this module when you need to interact with Google Gemini models using Google's native API format (contents, config, tools) rather than the OpenAI-compatible format. It is also used internally by LiteLLM when routing to Google GenAI providers.
Code Reference
Source Location
Signature
@client
def generate_content(
model: str,
contents: GenerateContentContentListUnionDict,
config: Optional[GenerateContentConfigDict] = None,
tools: Optional[ToolConfigDict] = None,
extra_headers: Optional[Dict[str, Any]] = None,
extra_query: Optional[Dict[str, Any]] = None,
extra_body: Optional[Dict[str, Any]] = None,
timeout: Optional[Union[float, httpx.Timeout]] = None,
custom_llm_provider: Optional[str] = None,
**kwargs,
) -> Any
@client
async def agenerate_content(...) -> Any
@client
def generate_content_stream(...) -> Iterator[Any]
@client
async def agenerate_content_stream(...) -> Any
Import
from litellm.google_genai.main import (
generate_content, agenerate_content,
generate_content_stream, agenerate_content_stream,
GenerateContentHelper, GenerateContentSetupResult,
)
I/O Contract
Inputs
| Parameter |
Type |
Required |
Description
|
model |
str |
Yes |
The model identifier (e.g., "google_genai/gemini-2.0-flash")
|
contents |
GenerateContentContentListUnionDict |
Yes |
The content list to generate from (Google GenAI format)
|
config |
Optional[GenerateContentConfigDict] |
No |
Generation configuration (temperature, top_p, etc.)
|
tools |
Optional[ToolConfigDict] |
No |
Tool definitions for function calling
|
custom_llm_provider |
Optional[str] |
No |
Provider override; auto-detected from model
|
systemInstruction |
via kwargs |
No |
System instruction for the model
|
generationConfig |
via kwargs |
No |
Backward-compatible alias for config
|
Outputs
| Function |
Return Type |
Description
|
generate_content |
Dict[str, Any] (GenerateContentResponse) |
Contains candidates, usage metadata, text
|
generate_content_stream |
Iterator[Any] |
Stream of partial generation results
|
| Mock response |
Dict[str, Any] |
Mock response with text, candidates, usageMetadata fields
|
Usage Examples
import litellm
response = litellm.generate_content(
model="google_genai/gemini-2.0-flash",
contents=[{"role": "user", "parts": [{"text": "Explain quantum computing."}]}],
config={"temperature": 0.7, "max_output_tokens": 500},
)
print(response["text"])
import asyncio
import litellm
async def main():
stream = await litellm.agenerate_content_stream(
model="google_genai/gemini-2.0-flash",
contents=[{"role": "user", "parts": [{"text": "Write a short story."}]}],
)
async for chunk in stream:
print(chunk)
asyncio.run(main())
Related Pages