Implementation:Explodinggradients Ragas LangSmith Integration
Appearance
| Metadata | Value |
|---|---|
| Source | src/ragas/integrations/langsmith.py (Lines 23-180)
|
| Domains | Integration, LangSmith |
| Last Updated | 2026-02-10 |
Overview
Provides functions to upload Ragas test datasets to LangSmith and run evaluations using Ragas metrics on LangSmith-hosted datasets with LLM chains or language models.
Description
This module contains two primary functions:
upload_datasetconverts a RagasTestsetto a pandas DataFrame and uploads it to LangSmith using the LangSmith client. It checks whether a dataset with the given name already exists and raises aValueErrorif so. The upload specifiesquestionas the input key andground_truthas the output key.
evaluateruns a full evaluation pipeline on a LangSmith dataset. It:- Validates that the specified dataset exists in LangSmith.
- Wraps each Ragas metric in an
EvaluatorChainfor LangChain/LangSmith compatibility. - Configures the evaluation using LangChain's
RunEvalConfigwith the wrapped metrics as custom evaluators. - Executes the evaluation using the LangSmith client's
run_on_datasetmethod. - Returns the evaluation results dictionary.
When no metrics are provided, the function defaults to four standard metrics: answer_relevancy, context_precision, faithfulness, and context_recall.
Usage
Use this integration when you want to:
- Store Ragas-generated test datasets in LangSmith for collaborative access and versioning.
- Run Ragas evaluation metrics on datasets hosted in LangSmith.
- Compare different LLM chains or models using standardized Ragas metrics within the LangSmith platform.
Code Reference
Source Location
| Item | Detail |
|---|---|
| File | src/ragas/integrations/langsmith.py
|
| Lines | 23-180 |
| Module | ragas.integrations.langsmith
|
Signatures
def upload_dataset(
dataset: Testset,
dataset_name: str,
dataset_desc: str = "",
) -> LangsmithDataset
def evaluate(
dataset_name: str,
llm_or_chain_factory: Any,
experiment_name: Optional[str] = None,
metrics: Optional[list] = None,
verbose: bool = False,
) -> Dict[str, Any]
Import
from ragas.integrations.langsmith import upload_dataset, evaluate
I/O Contract
upload_dataset
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | dataset |
Testset |
Ragas test dataset to upload |
| Input | dataset_name |
str |
Name for the dataset in LangSmith |
| Input | dataset_desc |
str |
Optional description (default: "")
|
| Output | (return) | LangsmithDataset |
The created LangSmith dataset object |
evaluate
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | dataset_name |
str |
Name of an existing LangSmith dataset |
| Input | llm_or_chain_factory |
Any |
LLM or chain factory to evaluate |
| Input | experiment_name |
Optional[str] |
Name for the experiment run (default: None)
|
| Input | metrics |
Optional[list] |
List of Ragas metrics (default: answer_relevancy, context_precision, faithfulness, context_recall) |
| Input | verbose |
bool |
Enable verbose output (default: False)
|
| Output | (return) | Dict[str, Any] |
Evaluation results from LangSmith |
Exceptions
| Exception | Function | Condition |
|---|---|---|
ValueError |
upload_dataset |
Dataset with the given name already exists in LangSmith |
ValueError |
evaluate |
Dataset with the given name does not exist in LangSmith |
ImportError |
(module) | The langsmith package is not installed
|
Usage Examples
Uploading a Ragas Test Dataset to LangSmith
from ragas.integrations.langsmith import upload_dataset
# Assume `testset` is a Ragas Testset generated from test data generation
langsmith_ds = upload_dataset(
dataset=testset,
dataset_name="my-rag-eval-dataset",
dataset_desc="Generated test dataset for RAG evaluation",
)
print(f"Dataset available at: {langsmith_ds.url}")
Running Evaluation with Default Metrics
from ragas.integrations.langsmith import evaluate
# Evaluate an LLM chain against a LangSmith dataset
results = evaluate(
dataset_name="my-rag-eval-dataset",
llm_or_chain_factory=my_rag_chain,
experiment_name="baseline-rag-v1",
verbose=True,
)
Running Evaluation with Custom Metrics
from ragas.integrations.langsmith import evaluate
from ragas.metrics import faithfulness, context_precision
results = evaluate(
dataset_name="my-rag-eval-dataset",
llm_or_chain_factory=my_rag_chain,
experiment_name="custom-metrics-run",
metrics=[faithfulness, context_precision],
verbose=True,
)
Related Pages
- EvaluatorChain Class - The chain wrapper used internally to adapt Ragas metrics for LangSmith
- LlamaIndex Integration - Alternative evaluation integration for LlamaIndex query engines
- OpikTracer Class - Alternative observability integration for logging evaluation traces
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment