Implementation:Vibrantlabsai Ragas Annotated Data Sample

Knowledge Sources	Vibrantlabsai_Ragas
Domains	LLM Evaluation, Sample Data, Metric Annotation
Last Updated	2026-02-12 00:00 GMT

Overview

A sample annotated evaluation dataset in JSON format used for demonstrating and testing the Ragas helpfulness metric with human-reviewed annotations.

Description

This file contains a collection of annotated evaluation samples organized under the key helpfulness. Each sample represents a complete evaluation trace for the AspectCritic metric (specifically configured as a helpfulness judge). The dataset includes:

metric_input: The original user input and the LLM response being evaluated.
metric_output: The binary verdict (1 for helpful, 0 for not helpful).
prompts: The full prompt trace including single_turn_aspect_critic_prompt with the prompt input, prompt output (containing a reason and verdict), and optionally an edited_output with human-corrected reasoning.
is_accepted: A boolean flag indicating whether the annotated sample was accepted as a valid training/evaluation example.

The dataset covers diverse evaluation scenarios including text improvement requests, anagram solving, factual questions, vacation recommendations, and text editing tasks. It contains both positive examples (verdict = 1, response is helpful) and negative examples (verdict = 0, response is unhelpful), making it suitable for training or aligning an LLM-as-a-judge metric.

Usage

This data file is used in the Ragas documentation to illustrate how annotated evaluation data is structured for the metric alignment and training workflow. It serves as a reference for users who want to create their own annotated datasets for fine-tuning or calibrating Ragas metrics. The file is also referenced in tutorials that demonstrate how to align an LLM as a judge using human-annotated feedback.

Code Reference

Source Location

Repository: Vibrantlabsai_Ragas
File: docs/_static/annotated_data.json

Data Schema

{
  "helpfulness": [
    {
      "metric_input": {
        "user_input": "can you fix this up better?...",
        "response": "Dear Sir,..."
      },
      "metric_output": 1,
      "prompts": {
        "single_turn_aspect_critic_prompt": {
          "prompt_input": {
            "user_input": "...",
            "response": "...",
            "retrieved_contexts": null,
            "reference_contexts": null,
            "reference": null
          },
          "prompt_output": {
            "reason": "The response effectively addresses...",
            "verdict": 1
          },
          "edited_output": {
            "reason": "The response is helpful because...",
            "verdict": 1
          }
        }
      },
      "is_accepted": true
    }
  ]
}

I/O Contract

Structure

Field	Type	Description
helpfulness	Array	Top-level key containing all annotated samples for the helpfulness metric
metric_input	Object	Contains `user_input` (string) and `response` (string) representing the evaluation pair
metric_output	Integer (0 or 1)	The final binary verdict for the metric evaluation
prompts	Object	Contains the prompt trace with `single_turn_aspect_critic_prompt`
prompts.single_turn_aspect_critic_prompt.prompt_input	Object	The full input sent to the LLM judge, including `user_input`, `response`, `retrieved_contexts`, `reference_contexts`, and `reference`
prompts.single_turn_aspect_critic_prompt.prompt_output	Object	The original LLM judge output with `reason` (string) and `verdict` (integer)
prompts.single_turn_aspect_critic_prompt.edited_output	Object or null	Human-edited correction of the judge output, containing `reason` and `verdict`, or null if no edit was made
is_accepted	Boolean	Whether the annotated sample was accepted as valid for training or evaluation

Usage Examples

Loading the Data

import json

with open("docs/_static/annotated_data.json") as f:
    data = json.load(f)

# Access all helpfulness samples
helpfulness_samples = data["helpfulness"]

# Filter accepted samples only
accepted = [s for s in helpfulness_samples if s["is_accepted"]]

# Filter samples with human edits
edited = [
    s for s in helpfulness_samples
    if s["prompts"]["single_turn_aspect_critic_prompt"]["edited_output"] is not None
]

print(f"Total samples: {len(helpfulness_samples)}")
print(f"Accepted samples: {len(accepted)}")
print(f"Samples with edits: {len(edited)}")

Using with Ragas Metric Training

from ragas.metrics import AspectCritic

# Load annotated data for metric alignment
import json

with open("docs/_static/annotated_data.json") as f:
    annotated_data = json.load(f)

# The data structure matches what Ragas expects
# for training and aligning LLM-as-a-judge metrics

Related Pages

Environment:Vibrantlabsai_Ragas_Python_3_9_Core_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment