Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Vibrantlabsai Ragas LiteLLM

From Leeroopedia
Knowledge Sources
Domains LLM Integration, Structured Output, Evaluation
Last Updated 2026-02-12 00:00 GMT

Overview

LiteLLMStructuredLLM is an LLM wrapper that uses LiteLLM with instructor for structured output generation, supporting 100+ LLM providers including Gemini, Ollama, vLLM, and Groq.

Description

LiteLLMStructuredLLM extends InstructorBaseRagasLLM to provide a unified interface for structured LLM output across a wide range of providers via the LiteLLM library. The class accepts an instructor-wrapped LiteLLM client and returns Pydantic model instances as structured outputs from LLM calls. It performs automatic async detection at initialization by inspecting the client for async capabilities through multiple strategies: checking for AsyncInstructor wrappers, direct acompletion methods, async chat completion interfaces, underlying wrapped async clients, and closure-captured async objects. The class supports optional caching through the Ragas CacheInterface, wrapping both generate and agenerate methods with the cacher decorator when a cache is provided. For sync clients that need to run async operations, _run_async_in_current_loop handles event loop management including Jupyter notebook environments by spawning a separate thread with its own event loop. All LLM calls are tracked via the Ragas analytics system using LLMUsageEvent.

Usage

Use this class when you want to leverage LiteLLM's multi-provider support within Ragas evaluations, particularly when you need structured (Pydantic model) outputs from diverse LLM providers.

Code Reference

Source Location

Signature

class LiteLLMStructuredLLM(InstructorBaseRagasLLM):
    def __init__(
        self,
        client: t.Any,
        model: str,
        provider: str,
        cache: t.Optional[CacheInterface] = None,
        system_prompt: t.Optional[str] = None,
        **kwargs,
    ):

Import

from ragas.llms.litellm_llm import LiteLLMStructuredLLM

I/O Contract

Inputs

Name Type Required Description
client Any Yes An instructor-wrapped LiteLLM client instance (sync or async)
model str Yes Model name identifier (e.g., "gemini-2.0-flash", "gpt-4o")
provider str Yes Provider name for analytics tracking
cache CacheInterface No Optional cache backend for caching LLM responses
system_prompt str No Optional system prompt to prepend to all messages
**kwargs Any No Additional model arguments passed to completions (temperature, max_tokens, etc.)

Outputs

Name Type Description
generate return InstructorTypeVar An instance of the specified Pydantic response_model populated with generated data
agenerate return InstructorTypeVar An instance of the specified Pydantic response_model populated with generated data (async)

Key Methods

Method Description
generate(prompt, response_model) Synchronous structured generation; delegates to agenerate for async clients
agenerate(prompt, response_model) Asynchronous structured generation; raises TypeError if client is not async-capable
_check_client_async() Inspects the client to determine if it supports async operations via multiple detection strategies
_run_async_in_current_loop(coro) Runs an async coroutine in the current event loop, with Jupyter notebook compatibility

Usage Examples

Basic Usage

import instructor
import litellm
from ragas.llms.litellm_llm import LiteLLMStructuredLLM
from pydantic import BaseModel

# Create an instructor-wrapped LiteLLM client
client = instructor.from_litellm(litellm.completion)

# Initialize the LLM wrapper
llm = LiteLLMStructuredLLM(
    client=client,
    model="gpt-4o",
    provider="openai",
)

# Define a response model
class Answer(BaseModel):
    text: str
    confidence: float

# Generate structured output
result = llm.generate(
    prompt="What is the capital of France?",
    response_model=Answer,
)
print(result.text, result.confidence)

Async Usage with Caching

import instructor
from openai import AsyncOpenAI
from ragas.llms.litellm_llm import LiteLLMStructuredLLM
from ragas.cache import InMemoryCache

# Create async client
async_client = instructor.from_openai(AsyncOpenAI())
cache = InMemoryCache()

llm = LiteLLMStructuredLLM(
    client=async_client,
    model="gpt-4o",
    provider="openai",
    cache=cache,
    temperature=0.0,
)

# Async generation
result = await llm.agenerate(
    prompt="Summarize quantum computing",
    response_model=Answer,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment