Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Explodinggradients Ragas LiteLLMStructuredLLM Class

From Leeroopedia


Metadata Value
Source src/ragas/llms/litellm_llm.py (Lines 14-271)
Domains LLM, LiteLLM
Last Updated 2026-02-10

Overview

An LLM wrapper that uses LiteLLM with instructor for structured Pydantic model outputs, supporting both synchronous and asynchronous generation across 100+ LLM providers including Gemini, Ollama, vLLM, and Groq.

Description

LiteLLMStructuredLLM extends InstructorBaseRagasLLM and provides structured output generation through LiteLLM's unified API. Key features:

  • Async detection: The _check_client_async method uses multiple heuristics to determine if the provided client supports async operations, checking for AsyncInstructor class names, acompletion methods, async chat.completions.create, underlying wrapped clients, and instructor closure functions. The is_async flag is set at initialization.
  • generate: Synchronous generation that builds a message list (with optional system prompt), calls client.chat.completions.create with the response_model for structured output, and tracks usage via ragas._analytics. If the client is async-only, it delegates to agenerate run in the appropriate event loop.
  • agenerate: Asynchronous generation using await on the client's async completion method. Raises TypeError if called on a synchronous client.
  • Event loop handling: _run_async_in_current_loop handles Jupyter notebook environments by detecting running event loops and spawning a separate thread with its own event loop when needed.
  • Caching: If a CacheInterface is provided, both generate and agenerate are wrapped with the cacher decorator for response caching.
  • Usage tracking: Each generation call tracks an LLMUsageEvent with provider, model, type, and async status.

Usage

Use this class when you want to use LiteLLM-supported providers (Google Gemini, Ollama, vLLM, Groq, Anthropic, etc.) with Ragas metrics that require structured Pydantic model outputs. It is the primary LLM wrapper for non-LangChain, non-LlamaIndex setups.

Code Reference

Source Location

Item Detail
File src/ragas/llms/litellm_llm.py
Lines 14-271
Module ragas.llms.litellm_llm

Class Signature

class LiteLLMStructuredLLM(InstructorBaseRagasLLM):
    def __init__(
        self,
        client: Any,
        model: str,
        provider: str,
        cache: Optional[CacheInterface] = None,
        system_prompt: Optional[str] = None,
        **kwargs,
    ) -> None: ...

    def generate(
        self,
        prompt: str,
        response_model: Type[InstructorTypeVar],
    ) -> InstructorTypeVar: ...

    async def agenerate(
        self,
        prompt: str,
        response_model: Type[InstructorTypeVar],
    ) -> InstructorTypeVar: ...

Import

from ragas.llms.litellm_llm import LiteLLMStructuredLLM

I/O Contract

Constructor

Name Type Required Description
client Any Yes LiteLLM or instructor-wrapped client instance
model str Yes Model identifier (e.g., "gemini-2.0-flash", "ollama/llama3")
provider str Yes Provider name for analytics tracking
cache Optional[CacheInterface] No Cache backend for response caching
system_prompt Optional[str] No System prompt prepended to all messages
**kwargs Any No Additional model arguments (e.g., temperature, max_tokens)

generate

Direction Name Type Description
Input prompt str The user prompt text
Input response_model Type[BaseModel] Pydantic model class for structured output
Output (return) BaseModel Instance of response_model populated with generated data

agenerate

Direction Name Type Description
Input prompt str The user prompt text
Input response_model Type[BaseModel] Pydantic model class for structured output
Output (return) BaseModel Instance of response_model populated with generated data

Exceptions

Exception Condition
TypeError agenerate called on a synchronous-only client

Usage Examples

Basic Usage with LiteLLM

import instructor
import litellm
from ragas.llms.litellm_llm import LiteLLMStructuredLLM

# Create an instructor-wrapped LiteLLM client
client = instructor.from_litellm(litellm.completion)

llm = LiteLLMStructuredLLM(
    client=client,
    model="gpt-4o-mini",
    provider="openai",
)

print(llm)
# LiteLLMStructuredLLM(model='gpt-4o-mini', provider='openai', is_async=False)

Using with Gemini

import instructor
import litellm
from ragas.llms.litellm_llm import LiteLLMStructuredLLM

client = instructor.from_litellm(litellm.completion)

llm = LiteLLMStructuredLLM(
    client=client,
    model="gemini/gemini-2.0-flash",
    provider="google",
    temperature=0.1,
)

Async Usage with Router

import instructor
from litellm import Router
from ragas.llms.litellm_llm import LiteLLMStructuredLLM

# Router supports async operations
router = Router(model_list=[...])
client = instructor.from_litellm(router.acompletion)

llm = LiteLLMStructuredLLM(
    client=client,
    model="gpt-4o",
    provider="openai",
)

print(llm.is_async)  # True

With Caching

from ragas.llms.litellm_llm import LiteLLMStructuredLLM
from ragas.cache import InMemoryCache

llm = LiteLLMStructuredLLM(
    client=client,
    model="gpt-4o-mini",
    provider="openai",
    cache=InMemoryCache(),
)
# Repeated identical prompts will return cached results

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment