| Metadata |
Value
|
| Source |
src/ragas/llms/litellm_llm.py (Lines 14-271)
|
| Domains |
LLM, LiteLLM
|
| Last Updated |
2026-02-10
|
Overview
An LLM wrapper that uses LiteLLM with instructor for structured Pydantic model outputs, supporting both synchronous and asynchronous generation across 100+ LLM providers including Gemini, Ollama, vLLM, and Groq.
Description
LiteLLMStructuredLLM extends InstructorBaseRagasLLM and provides structured output generation through LiteLLM's unified API. Key features:
- Async detection: The
_check_client_async method uses multiple heuristics to determine if the provided client supports async operations, checking for AsyncInstructor class names, acompletion methods, async chat.completions.create, underlying wrapped clients, and instructor closure functions. The is_async flag is set at initialization.
generate: Synchronous generation that builds a message list (with optional system prompt), calls client.chat.completions.create with the response_model for structured output, and tracks usage via ragas._analytics. If the client is async-only, it delegates to agenerate run in the appropriate event loop.
agenerate: Asynchronous generation using await on the client's async completion method. Raises TypeError if called on a synchronous client.
- Event loop handling:
_run_async_in_current_loop handles Jupyter notebook environments by detecting running event loops and spawning a separate thread with its own event loop when needed.
- Caching: If a
CacheInterface is provided, both generate and agenerate are wrapped with the cacher decorator for response caching.
- Usage tracking: Each generation call tracks an
LLMUsageEvent with provider, model, type, and async status.
Usage
Use this class when you want to use LiteLLM-supported providers (Google Gemini, Ollama, vLLM, Groq, Anthropic, etc.) with Ragas metrics that require structured Pydantic model outputs. It is the primary LLM wrapper for non-LangChain, non-LlamaIndex setups.
Code Reference
Source Location
| Item |
Detail
|
| File |
src/ragas/llms/litellm_llm.py
|
| Lines |
14-271
|
| Module |
ragas.llms.litellm_llm
|
Class Signature
class LiteLLMStructuredLLM(InstructorBaseRagasLLM):
def __init__(
self,
client: Any,
model: str,
provider: str,
cache: Optional[CacheInterface] = None,
system_prompt: Optional[str] = None,
**kwargs,
) -> None: ...
def generate(
self,
prompt: str,
response_model: Type[InstructorTypeVar],
) -> InstructorTypeVar: ...
async def agenerate(
self,
prompt: str,
response_model: Type[InstructorTypeVar],
) -> InstructorTypeVar: ...
Import
from ragas.llms.litellm_llm import LiteLLMStructuredLLM
I/O Contract
Constructor
| Name |
Type |
Required |
Description
|
client |
Any |
Yes |
LiteLLM or instructor-wrapped client instance
|
model |
str |
Yes |
Model identifier (e.g., "gemini-2.0-flash", "ollama/llama3")
|
provider |
str |
Yes |
Provider name for analytics tracking
|
cache |
Optional[CacheInterface] |
No |
Cache backend for response caching
|
system_prompt |
Optional[str] |
No |
System prompt prepended to all messages
|
**kwargs |
Any |
No |
Additional model arguments (e.g., temperature, max_tokens)
|
generate
| Direction |
Name |
Type |
Description
|
| Input |
prompt |
str |
The user prompt text
|
| Input |
response_model |
Type[BaseModel] |
Pydantic model class for structured output
|
| Output |
(return) |
BaseModel |
Instance of response_model populated with generated data
|
agenerate
| Direction |
Name |
Type |
Description
|
| Input |
prompt |
str |
The user prompt text
|
| Input |
response_model |
Type[BaseModel] |
Pydantic model class for structured output
|
| Output |
(return) |
BaseModel |
Instance of response_model populated with generated data
|
Exceptions
| Exception |
Condition
|
TypeError |
agenerate called on a synchronous-only client
|
Usage Examples
Basic Usage with LiteLLM
import instructor
import litellm
from ragas.llms.litellm_llm import LiteLLMStructuredLLM
# Create an instructor-wrapped LiteLLM client
client = instructor.from_litellm(litellm.completion)
llm = LiteLLMStructuredLLM(
client=client,
model="gpt-4o-mini",
provider="openai",
)
print(llm)
# LiteLLMStructuredLLM(model='gpt-4o-mini', provider='openai', is_async=False)
Using with Gemini
import instructor
import litellm
from ragas.llms.litellm_llm import LiteLLMStructuredLLM
client = instructor.from_litellm(litellm.completion)
llm = LiteLLMStructuredLLM(
client=client,
model="gemini/gemini-2.0-flash",
provider="google",
temperature=0.1,
)
Async Usage with Router
import instructor
from litellm import Router
from ragas.llms.litellm_llm import LiteLLMStructuredLLM
# Router supports async operations
router = Router(model_list=[...])
client = instructor.from_litellm(router.acompletion)
llm = LiteLLMStructuredLLM(
client=client,
model="gpt-4o",
provider="openai",
)
print(llm.is_async) # True
With Caching
from ragas.llms.litellm_llm import LiteLLMStructuredLLM
from ragas.cache import InMemoryCache
llm = LiteLLMStructuredLLM(
client=client,
model="gpt-4o-mini",
provider="openai",
cache=InMemoryCache(),
)
# Repeated identical prompts will return cached results
Related Pages