Implementation:Vibrantlabsai Ragas Annotated Data Sample
| Knowledge Sources | |
|---|---|
| Domains | LLM Evaluation, Sample Data, Metric Annotation |
| Last Updated | 2026-02-12 00:00 GMT |
Overview
A sample annotated evaluation dataset in JSON format used for demonstrating and testing the Ragas helpfulness metric with human-reviewed annotations.
Description
This file contains a collection of annotated evaluation samples organized under the key helpfulness. Each sample represents a complete evaluation trace for the AspectCritic metric (specifically configured as a helpfulness judge). The dataset includes:
- metric_input: The original user input and the LLM response being evaluated.
- metric_output: The binary verdict (1 for helpful, 0 for not helpful).
- prompts: The full prompt trace including
single_turn_aspect_critic_promptwith the prompt input, prompt output (containing a reason and verdict), and optionally anedited_outputwith human-corrected reasoning. - is_accepted: A boolean flag indicating whether the annotated sample was accepted as a valid training/evaluation example.
The dataset covers diverse evaluation scenarios including text improvement requests, anagram solving, factual questions, vacation recommendations, and text editing tasks. It contains both positive examples (verdict = 1, response is helpful) and negative examples (verdict = 0, response is unhelpful), making it suitable for training or aligning an LLM-as-a-judge metric.
Usage
This data file is used in the Ragas documentation to illustrate how annotated evaluation data is structured for the metric alignment and training workflow. It serves as a reference for users who want to create their own annotated datasets for fine-tuning or calibrating Ragas metrics. The file is also referenced in tutorials that demonstrate how to align an LLM as a judge using human-annotated feedback.
Code Reference
Source Location
- Repository: Vibrantlabsai_Ragas
- File:
docs/_static/annotated_data.json
Data Schema
{
"helpfulness": [
{
"metric_input": {
"user_input": "can you fix this up better?...",
"response": "Dear Sir,..."
},
"metric_output": 1,
"prompts": {
"single_turn_aspect_critic_prompt": {
"prompt_input": {
"user_input": "...",
"response": "...",
"retrieved_contexts": null,
"reference_contexts": null,
"reference": null
},
"prompt_output": {
"reason": "The response effectively addresses...",
"verdict": 1
},
"edited_output": {
"reason": "The response is helpful because...",
"verdict": 1
}
}
},
"is_accepted": true
}
]
}
I/O Contract
Structure
| Field | Type | Description |
|---|---|---|
| helpfulness | Array | Top-level key containing all annotated samples for the helpfulness metric |
| metric_input | Object | Contains user_input (string) and response (string) representing the evaluation pair
|
| metric_output | Integer (0 or 1) | The final binary verdict for the metric evaluation |
| prompts | Object | Contains the prompt trace with single_turn_aspect_critic_prompt
|
| prompts.single_turn_aspect_critic_prompt.prompt_input | Object | The full input sent to the LLM judge, including user_input, response, retrieved_contexts, reference_contexts, and reference
|
| prompts.single_turn_aspect_critic_prompt.prompt_output | Object | The original LLM judge output with reason (string) and verdict (integer)
|
| prompts.single_turn_aspect_critic_prompt.edited_output | Object or null | Human-edited correction of the judge output, containing reason and verdict, or null if no edit was made
|
| is_accepted | Boolean | Whether the annotated sample was accepted as valid for training or evaluation |
Usage Examples
Loading the Data
import json
with open("docs/_static/annotated_data.json") as f:
data = json.load(f)
# Access all helpfulness samples
helpfulness_samples = data["helpfulness"]
# Filter accepted samples only
accepted = [s for s in helpfulness_samples if s["is_accepted"]]
# Filter samples with human edits
edited = [
s for s in helpfulness_samples
if s["prompts"]["single_turn_aspect_critic_prompt"]["edited_output"] is not None
]
print(f"Total samples: {len(helpfulness_samples)}")
print(f"Accepted samples: {len(accepted)}")
print(f"Samples with edits: {len(edited)}")
Using with Ragas Metric Training
from ragas.metrics import AspectCritic
# Load annotated data for metric alignment
import json
with open("docs/_static/annotated_data.json") as f:
annotated_data = json.load(f)
# The data structure matches what Ragas expects
# for training and aligning LLM-as-a-judge metrics