Implementation:Vibrantlabsai Ragas LiteLLM

Knowledge Sources	Vibrantlabsai_Ragas
Domains	LLM Integration, Structured Output, Evaluation
Last Updated	2026-02-12 00:00 GMT

Overview

LiteLLMStructuredLLM is an LLM wrapper that uses LiteLLM with instructor for structured output generation, supporting 100+ LLM providers including Gemini, Ollama, vLLM, and Groq.

Description

LiteLLMStructuredLLM extends InstructorBaseRagasLLM to provide a unified interface for structured LLM output across a wide range of providers via the LiteLLM library. The class accepts an instructor-wrapped LiteLLM client and returns Pydantic model instances as structured outputs from LLM calls. It performs automatic async detection at initialization by inspecting the client for async capabilities through multiple strategies: checking for AsyncInstructor wrappers, direct acompletion methods, async chat completion interfaces, underlying wrapped async clients, and closure-captured async objects. The class supports optional caching through the Ragas CacheInterface, wrapping both generate and agenerate methods with the cacher decorator when a cache is provided. For sync clients that need to run async operations, _run_async_in_current_loop handles event loop management including Jupyter notebook environments by spawning a separate thread with its own event loop. All LLM calls are tracked via the Ragas analytics system using LLMUsageEvent.

Usage

Use this class when you want to leverage LiteLLM's multi-provider support within Ragas evaluations, particularly when you need structured (Pydantic model) outputs from diverse LLM providers.

Code Reference

Source Location

Repository: Vibrantlabsai_Ragas
File: src/ragas/llms/litellm_llm.py

Signature

class LiteLLMStructuredLLM(InstructorBaseRagasLLM):
    def __init__(
        self,
        client: t.Any,
        model: str,
        provider: str,
        cache: t.Optional[CacheInterface] = None,
        system_prompt: t.Optional[str] = None,
        **kwargs,
    ):

Import

from ragas.llms.litellm_llm import LiteLLMStructuredLLM

I/O Contract

Inputs

Name	Type	Required	Description
client	Any	Yes	An instructor-wrapped LiteLLM client instance (sync or async)
model	str	Yes	Model name identifier (e.g., "gemini-2.0-flash", "gpt-4o")
provider	str	Yes	Provider name for analytics tracking
cache	CacheInterface	No	Optional cache backend for caching LLM responses
system_prompt	str	No	Optional system prompt to prepend to all messages
**kwargs	Any	No	Additional model arguments passed to completions (temperature, max_tokens, etc.)

Outputs

Name	Type	Description
generate return	InstructorTypeVar	An instance of the specified Pydantic response_model populated with generated data
agenerate return	InstructorTypeVar	An instance of the specified Pydantic response_model populated with generated data (async)

Key Methods

Method	Description
`generate(prompt, response_model)`	Synchronous structured generation; delegates to agenerate for async clients
`agenerate(prompt, response_model)`	Asynchronous structured generation; raises TypeError if client is not async-capable
`_check_client_async()`	Inspects the client to determine if it supports async operations via multiple detection strategies
`_run_async_in_current_loop(coro)`	Runs an async coroutine in the current event loop, with Jupyter notebook compatibility

Usage Examples

Basic Usage

import instructor
import litellm
from ragas.llms.litellm_llm import LiteLLMStructuredLLM
from pydantic import BaseModel

# Create an instructor-wrapped LiteLLM client
client = instructor.from_litellm(litellm.completion)

# Initialize the LLM wrapper
llm = LiteLLMStructuredLLM(
    client=client,
    model="gpt-4o",
    provider="openai",
)

# Define a response model
class Answer(BaseModel):
    text: str
    confidence: float

# Generate structured output
result = llm.generate(
    prompt="What is the capital of France?",
    response_model=Answer,
)
print(result.text, result.confidence)

Async Usage with Caching

import instructor
from openai import AsyncOpenAI
from ragas.llms.litellm_llm import LiteLLMStructuredLLM
from ragas.cache import InMemoryCache

# Create async client
async_client = instructor.from_openai(AsyncOpenAI())
cache = InMemoryCache()

llm = LiteLLMStructuredLLM(
    client=async_client,
    model="gpt-4o",
    provider="openai",
    cache=cache,
    temperature=0.0,
)

# Async generation
result = await llm.agenerate(
    prompt="Summarize quantum computing",
    response_model=Answer,
)

Related Pages

Environment:Vibrantlabsai_Ragas_Python_3_9_Core_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment