Overview
LiteLLMStructuredLLM is an LLM wrapper that uses LiteLLM with instructor for structured output generation, supporting 100+ LLM providers including Gemini, Ollama, vLLM, and Groq.
Description
LiteLLMStructuredLLM extends InstructorBaseRagasLLM to provide a unified interface for structured LLM output across a wide range of providers via the LiteLLM library. The class accepts an instructor-wrapped LiteLLM client and returns Pydantic model instances as structured outputs from LLM calls. It performs automatic async detection at initialization by inspecting the client for async capabilities through multiple strategies: checking for AsyncInstructor wrappers, direct acompletion methods, async chat completion interfaces, underlying wrapped async clients, and closure-captured async objects. The class supports optional caching through the Ragas CacheInterface, wrapping both generate and agenerate methods with the cacher decorator when a cache is provided. For sync clients that need to run async operations, _run_async_in_current_loop handles event loop management including Jupyter notebook environments by spawning a separate thread with its own event loop. All LLM calls are tracked via the Ragas analytics system using LLMUsageEvent.
Usage
Use this class when you want to leverage LiteLLM's multi-provider support within Ragas evaluations, particularly when you need structured (Pydantic model) outputs from diverse LLM providers.
Code Reference
Source Location
Signature
class LiteLLMStructuredLLM(InstructorBaseRagasLLM):
def __init__(
self,
client: t.Any,
model: str,
provider: str,
cache: t.Optional[CacheInterface] = None,
system_prompt: t.Optional[str] = None,
**kwargs,
):
Import
from ragas.llms.litellm_llm import LiteLLMStructuredLLM
I/O Contract
Inputs
| Name |
Type |
Required |
Description
|
| client |
Any |
Yes |
An instructor-wrapped LiteLLM client instance (sync or async)
|
| model |
str |
Yes |
Model name identifier (e.g., "gemini-2.0-flash", "gpt-4o")
|
| provider |
str |
Yes |
Provider name for analytics tracking
|
| cache |
CacheInterface |
No |
Optional cache backend for caching LLM responses
|
| system_prompt |
str |
No |
Optional system prompt to prepend to all messages
|
| **kwargs |
Any |
No |
Additional model arguments passed to completions (temperature, max_tokens, etc.)
|
Outputs
| Name |
Type |
Description
|
| generate return |
InstructorTypeVar |
An instance of the specified Pydantic response_model populated with generated data
|
| agenerate return |
InstructorTypeVar |
An instance of the specified Pydantic response_model populated with generated data (async)
|
Key Methods
| Method |
Description
|
generate(prompt, response_model) |
Synchronous structured generation; delegates to agenerate for async clients
|
agenerate(prompt, response_model) |
Asynchronous structured generation; raises TypeError if client is not async-capable
|
_check_client_async() |
Inspects the client to determine if it supports async operations via multiple detection strategies
|
_run_async_in_current_loop(coro) |
Runs an async coroutine in the current event loop, with Jupyter notebook compatibility
|
Usage Examples
Basic Usage
import instructor
import litellm
from ragas.llms.litellm_llm import LiteLLMStructuredLLM
from pydantic import BaseModel
# Create an instructor-wrapped LiteLLM client
client = instructor.from_litellm(litellm.completion)
# Initialize the LLM wrapper
llm = LiteLLMStructuredLLM(
client=client,
model="gpt-4o",
provider="openai",
)
# Define a response model
class Answer(BaseModel):
text: str
confidence: float
# Generate structured output
result = llm.generate(
prompt="What is the capital of France?",
response_model=Answer,
)
print(result.text, result.confidence)
Async Usage with Caching
import instructor
from openai import AsyncOpenAI
from ragas.llms.litellm_llm import LiteLLMStructuredLLM
from ragas.cache import InMemoryCache
# Create async client
async_client = instructor.from_openai(AsyncOpenAI())
cache = InMemoryCache()
llm = LiteLLMStructuredLLM(
client=async_client,
model="gpt-4o",
provider="openai",
cache=cache,
temperature=0.0,
)
# Async generation
result = await llm.agenerate(
prompt="Summarize quantum computing",
response_model=Answer,
)
Related Pages