Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Explodinggradients Ragas LangSmith Integration

From Leeroopedia


Metadata Value
Source src/ragas/integrations/langsmith.py (Lines 23-180)
Domains Integration, LangSmith
Last Updated 2026-02-10

Overview

Provides functions to upload Ragas test datasets to LangSmith and run evaluations using Ragas metrics on LangSmith-hosted datasets with LLM chains or language models.

Description

This module contains two primary functions:

  • upload_dataset converts a Ragas Testset to a pandas DataFrame and uploads it to LangSmith using the LangSmith client. It checks whether a dataset with the given name already exists and raises a ValueError if so. The upload specifies question as the input key and ground_truth as the output key.
  • evaluate runs a full evaluation pipeline on a LangSmith dataset. It:
    • Validates that the specified dataset exists in LangSmith.
    • Wraps each Ragas metric in an EvaluatorChain for LangChain/LangSmith compatibility.
    • Configures the evaluation using LangChain's RunEvalConfig with the wrapped metrics as custom evaluators.
    • Executes the evaluation using the LangSmith client's run_on_dataset method.
    • Returns the evaluation results dictionary.

When no metrics are provided, the function defaults to four standard metrics: answer_relevancy, context_precision, faithfulness, and context_recall.

Usage

Use this integration when you want to:

  • Store Ragas-generated test datasets in LangSmith for collaborative access and versioning.
  • Run Ragas evaluation metrics on datasets hosted in LangSmith.
  • Compare different LLM chains or models using standardized Ragas metrics within the LangSmith platform.

Code Reference

Source Location

Item Detail
File src/ragas/integrations/langsmith.py
Lines 23-180
Module ragas.integrations.langsmith

Signatures

def upload_dataset(
    dataset: Testset,
    dataset_name: str,
    dataset_desc: str = "",
) -> LangsmithDataset

def evaluate(
    dataset_name: str,
    llm_or_chain_factory: Any,
    experiment_name: Optional[str] = None,
    metrics: Optional[list] = None,
    verbose: bool = False,
) -> Dict[str, Any]

Import

from ragas.integrations.langsmith import upload_dataset, evaluate

I/O Contract

upload_dataset

Direction Name Type Description
Input dataset Testset Ragas test dataset to upload
Input dataset_name str Name for the dataset in LangSmith
Input dataset_desc str Optional description (default: "")
Output (return) LangsmithDataset The created LangSmith dataset object

evaluate

Direction Name Type Description
Input dataset_name str Name of an existing LangSmith dataset
Input llm_or_chain_factory Any LLM or chain factory to evaluate
Input experiment_name Optional[str] Name for the experiment run (default: None)
Input metrics Optional[list] List of Ragas metrics (default: answer_relevancy, context_precision, faithfulness, context_recall)
Input verbose bool Enable verbose output (default: False)
Output (return) Dict[str, Any] Evaluation results from LangSmith

Exceptions

Exception Function Condition
ValueError upload_dataset Dataset with the given name already exists in LangSmith
ValueError evaluate Dataset with the given name does not exist in LangSmith
ImportError (module) The langsmith package is not installed

Usage Examples

Uploading a Ragas Test Dataset to LangSmith

from ragas.integrations.langsmith import upload_dataset

# Assume `testset` is a Ragas Testset generated from test data generation
langsmith_ds = upload_dataset(
    dataset=testset,
    dataset_name="my-rag-eval-dataset",
    dataset_desc="Generated test dataset for RAG evaluation",
)

print(f"Dataset available at: {langsmith_ds.url}")

Running Evaluation with Default Metrics

from ragas.integrations.langsmith import evaluate

# Evaluate an LLM chain against a LangSmith dataset
results = evaluate(
    dataset_name="my-rag-eval-dataset",
    llm_or_chain_factory=my_rag_chain,
    experiment_name="baseline-rag-v1",
    verbose=True,
)

Running Evaluation with Custom Metrics

from ragas.integrations.langsmith import evaluate
from ragas.metrics import faithfulness, context_precision

results = evaluate(
    dataset_name="my-rag-eval-dataset",
    llm_or_chain_factory=my_rag_chain,
    experiment_name="custom-metrics-run",
    metrics=[faithfulness, context_precision],
    verbose=True,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment