Implementation:Explodinggradients Ragas LiteLLMStructuredLLM Class

Metadata	Value
Source	`src/ragas/llms/litellm_llm.py` (Lines 14-271)
Domains	LLM, LiteLLM
Last Updated	2026-02-10

Overview

An LLM wrapper that uses LiteLLM with instructor for structured Pydantic model outputs, supporting both synchronous and asynchronous generation across 100+ LLM providers including Gemini, Ollama, vLLM, and Groq.

Description

LiteLLMStructuredLLM extends InstructorBaseRagasLLM and provides structured output generation through LiteLLM's unified API. Key features:

Async detection: The _check_client_async method uses multiple heuristics to determine if the provided client supports async operations, checking for AsyncInstructor class names, acompletion methods, async chat.completions.create, underlying wrapped clients, and instructor closure functions. The is_async flag is set at initialization.

generate: Synchronous generation that builds a message list (with optional system prompt), calls client.chat.completions.create with the response_model for structured output, and tracks usage via ragas._analytics. If the client is async-only, it delegates to agenerate run in the appropriate event loop.

agenerate: Asynchronous generation using await on the client's async completion method. Raises TypeError if called on a synchronous client.

Event loop handling: _run_async_in_current_loop handles Jupyter notebook environments by detecting running event loops and spawning a separate thread with its own event loop when needed.

Caching: If a CacheInterface is provided, both generate and agenerate are wrapped with the cacher decorator for response caching.

Usage tracking: Each generation call tracks an LLMUsageEvent with provider, model, type, and async status.

Usage

Use this class when you want to use LiteLLM-supported providers (Google Gemini, Ollama, vLLM, Groq, Anthropic, etc.) with Ragas metrics that require structured Pydantic model outputs. It is the primary LLM wrapper for non-LangChain, non-LlamaIndex setups.

Code Reference

Source Location

Item	Detail
File	`src/ragas/llms/litellm_llm.py`
Lines	14-271
Module	`ragas.llms.litellm_llm`

Class Signature

class LiteLLMStructuredLLM(InstructorBaseRagasLLM):
    def __init__(
        self,
        client: Any,
        model: str,
        provider: str,
        cache: Optional[CacheInterface] = None,
        system_prompt: Optional[str] = None,
        **kwargs,
    ) -> None: ...

    def generate(
        self,
        prompt: str,
        response_model: Type[InstructorTypeVar],
    ) -> InstructorTypeVar: ...

    async def agenerate(
        self,
        prompt: str,
        response_model: Type[InstructorTypeVar],
    ) -> InstructorTypeVar: ...

Import

from ragas.llms.litellm_llm import LiteLLMStructuredLLM

I/O Contract

Constructor

Name	Type	Required	Description
`client`	`Any`	Yes	LiteLLM or instructor-wrapped client instance
`model`	`str`	Yes	Model identifier (e.g., `"gemini-2.0-flash"`, `"ollama/llama3"`)
`provider`	`str`	Yes	Provider name for analytics tracking
`cache`	`Optional[CacheInterface]`	No	Cache backend for response caching
`system_prompt`	`Optional[str]`	No	System prompt prepended to all messages
`**kwargs`	`Any`	No	Additional model arguments (e.g., `temperature`, `max_tokens`)

`generate`

Direction	Name	Type	Description
Input	`prompt`	`str`	The user prompt text
Input	`response_model`	`Type[BaseModel]`	Pydantic model class for structured output
Output	(return)	`BaseModel`	Instance of `response_model` populated with generated data

`agenerate`

Direction	Name	Type	Description
Input	`prompt`	`str`	The user prompt text
Input	`response_model`	`Type[BaseModel]`	Pydantic model class for structured output
Output	(return)	`BaseModel`	Instance of `response_model` populated with generated data

Exceptions

Exception	Condition
`TypeError`	`agenerate` called on a synchronous-only client

Usage Examples

Basic Usage with LiteLLM

import instructor
import litellm
from ragas.llms.litellm_llm import LiteLLMStructuredLLM

# Create an instructor-wrapped LiteLLM client
client = instructor.from_litellm(litellm.completion)

llm = LiteLLMStructuredLLM(
    client=client,
    model="gpt-4o-mini",
    provider="openai",
)

print(llm)
# LiteLLMStructuredLLM(model='gpt-4o-mini', provider='openai', is_async=False)

Using with Gemini

import instructor
import litellm
from ragas.llms.litellm_llm import LiteLLMStructuredLLM

client = instructor.from_litellm(litellm.completion)

llm = LiteLLMStructuredLLM(
    client=client,
    model="gemini/gemini-2.0-flash",
    provider="google",
    temperature=0.1,
)

Async Usage with Router

import instructor
from litellm import Router
from ragas.llms.litellm_llm import LiteLLMStructuredLLM

# Router supports async operations
router = Router(model_list=[...])
client = instructor.from_litellm(router.acompletion)

llm = LiteLLMStructuredLLM(
    client=client,
    model="gpt-4o",
    provider="openai",
)

print(llm.is_async)  # True

With Caching

from ragas.llms.litellm_llm import LiteLLMStructuredLLM
from ragas.cache import InMemoryCache

llm = LiteLLMStructuredLLM(
    client=client,
    model="gpt-4o-mini",
    provider="openai",
    cache=InMemoryCache(),
)
# Repeated identical prompts will return cached results

Related Pages

HaystackLLMWrapper Class - Alternative LLM wrapper for Haystack generators
EvaluatorChain Class - Uses LangChain LLM wrappers; LiteLLMStructuredLLM serves as a non-LangChain alternative
HeliconeSingleton Class - Can be used alongside LiteLLM for observability via proxy headers

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment