Implementation:Explodinggradients Ragas LangSmith Integration

Metadata	Value
Source	`src/ragas/integrations/langsmith.py` (Lines 23-180)
Domains	Integration, LangSmith
Last Updated	2026-02-10

Overview

Provides functions to upload Ragas test datasets to LangSmith and run evaluations using Ragas metrics on LangSmith-hosted datasets with LLM chains or language models.

Description

This module contains two primary functions:

upload_dataset converts a Ragas Testset to a pandas DataFrame and uploads it to LangSmith using the LangSmith client. It checks whether a dataset with the given name already exists and raises a ValueError if so. The upload specifies question as the input key and ground_truth as the output key.

evaluate runs a full evaluation pipeline on a LangSmith dataset. It:
- Validates that the specified dataset exists in LangSmith.
- Wraps each Ragas metric in an EvaluatorChain for LangChain/LangSmith compatibility.
- Configures the evaluation using LangChain's RunEvalConfig with the wrapped metrics as custom evaluators.
- Executes the evaluation using the LangSmith client's run_on_dataset method.
- Returns the evaluation results dictionary.

When no metrics are provided, the function defaults to four standard metrics: answer_relevancy, context_precision, faithfulness, and context_recall.

Usage

Use this integration when you want to:

Store Ragas-generated test datasets in LangSmith for collaborative access and versioning.
Run Ragas evaluation metrics on datasets hosted in LangSmith.
Compare different LLM chains or models using standardized Ragas metrics within the LangSmith platform.

Code Reference

Source Location

Item	Detail
File	`src/ragas/integrations/langsmith.py`
Lines	23-180
Module	`ragas.integrations.langsmith`

Signatures

def upload_dataset(
    dataset: Testset,
    dataset_name: str,
    dataset_desc: str = "",
) -> LangsmithDataset

def evaluate(
    dataset_name: str,
    llm_or_chain_factory: Any,
    experiment_name: Optional[str] = None,
    metrics: Optional[list] = None,
    verbose: bool = False,
) -> Dict[str, Any]

Import

from ragas.integrations.langsmith import upload_dataset, evaluate

I/O Contract

`upload_dataset`

Direction	Name	Type	Description
Input	`dataset`	`Testset`	Ragas test dataset to upload
Input	`dataset_name`	`str`	Name for the dataset in LangSmith
Input	`dataset_desc`	`str`	Optional description (default: `""`)
Output	(return)	`LangsmithDataset`	The created LangSmith dataset object

`evaluate`

Direction	Name	Type	Description
Input	`dataset_name`	`str`	Name of an existing LangSmith dataset
Input	`llm_or_chain_factory`	`Any`	LLM or chain factory to evaluate
Input	`experiment_name`	`Optional[str]`	Name for the experiment run (default: `None`)
Input	`metrics`	`Optional[list]`	List of Ragas metrics (default: answer_relevancy, context_precision, faithfulness, context_recall)
Input	`verbose`	`bool`	Enable verbose output (default: `False`)
Output	(return)	`Dict[str, Any]`	Evaluation results from LangSmith

Exceptions

Exception	Function	Condition
`ValueError`	`upload_dataset`	Dataset with the given name already exists in LangSmith
`ValueError`	`evaluate`	Dataset with the given name does not exist in LangSmith
`ImportError`	(module)	The `langsmith` package is not installed

Usage Examples

Uploading a Ragas Test Dataset to LangSmith

from ragas.integrations.langsmith import upload_dataset

# Assume `testset` is a Ragas Testset generated from test data generation
langsmith_ds = upload_dataset(
    dataset=testset,
    dataset_name="my-rag-eval-dataset",
    dataset_desc="Generated test dataset for RAG evaluation",
)

print(f"Dataset available at: {langsmith_ds.url}")

Running Evaluation with Default Metrics

from ragas.integrations.langsmith import evaluate

# Evaluate an LLM chain against a LangSmith dataset
results = evaluate(
    dataset_name="my-rag-eval-dataset",
    llm_or_chain_factory=my_rag_chain,
    experiment_name="baseline-rag-v1",
    verbose=True,
)

Running Evaluation with Custom Metrics

from ragas.integrations.langsmith import evaluate
from ragas.metrics import faithfulness, context_precision

results = evaluate(
    dataset_name="my-rag-eval-dataset",
    llm_or_chain_factory=my_rag_chain,
    experiment_name="custom-metrics-run",
    metrics=[faithfulness, context_precision],
    verbose=True,
)

Related Pages

EvaluatorChain Class - The chain wrapper used internally to adapt Ragas metrics for LangSmith
LlamaIndex Integration - Alternative evaluation integration for LlamaIndex query engines
OpikTracer Class - Alternative observability integration for logging evaluation traces

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment