Implementation:Vibrantlabsai Ragas Annotated Summary Sample

Knowledge Sources	Vibrantlabsai_Ragas
Domains	LLM Evaluation, Sample Data, Summary Accuracy
Last Updated	2026-02-12 00:00 GMT

Overview

A sample annotated summary dataset in JSON format used for demonstrating and testing the Ragas summary_accuracy metric with human-reviewed annotations.

Description

This file contains annotated evaluation samples organized under the key summary_accuracy. Each sample evaluates whether an LLM-generated summary accurately captures the key information from an original text passage. The dataset focuses on business and financial content including earnings reports, market analyses, supply chain discussions, marketing strategies, and regional growth trends.

Key characteristics of this data:

Summarization-focused evaluation: Each sample contains a user_input field with the instruction "summarise given text" followed by the source text, and a response field with the generated summary.
No reference answer: Unlike the answer correctness data, these samples evaluate summaries without a separate reference, relying on the source text itself as the ground truth.
Business domain content: The passages cover topics such as Q2 earnings reports, European market expansion, supply chain challenges, marketing campaign shifts, logistics investments, and market share analysis.
Acceptance filtering: A notable proportion of samples have is_accepted set to false, indicating they were filtered out during the annotation process due to quality concerns or disagreements.

The dataset demonstrates how summary accuracy evaluation differs from other metric types: the judge must determine whether the summary faithfully represents the source text without omitting critical details such as specific percentages, time periods, or geographic regions.

Usage

This data file is used in the Ragas documentation to illustrate how annotated data for the summary accuracy metric is structured. It provides reference examples for users who want to build their own annotated datasets for evaluating and aligning summarization quality metrics.

Code Reference

Source Location

Repository: Vibrantlabsai_Ragas
File: docs/_static/sample_annotated_summary.json

Data Schema

{
  "summary_accuracy": [
    {
      "metric_input": {
        "user_input": "summarise given text\nThe Q2 earnings report revealed...",
        "response": "The Q2 earnings report showed a 15% revenue increase..."
      },
      "metric_output": 1,
      "prompts": {
        "single_turn_aspect_critic_prompt": {
          "prompt_input": {
            "user_input": "summarise given text\n...",
            "response": "...",
            "retrieved_contexts": null,
            "reference_contexts": null,
            "reference": null
          },
          "prompt_output": {
            "reason": "The summary accurately captures the key points...",
            "verdict": 1
          },
          "edited_output": null
        }
      },
      "is_accepted": true
    }
  ]
}

I/O Contract

Structure

Field	Type	Description
summary_accuracy	Array	Top-level key containing all annotated samples for the summary accuracy metric
metric_input	Object	Contains `user_input` (string with summarization instruction plus source text) and `response` (string with the generated summary)
metric_output	Integer (0 or 1)	The final binary verdict indicating whether the summary is accurate (1) or inaccurate (0)
prompts	Object	Contains the prompt trace with `single_turn_aspect_critic_prompt`
prompts.single_turn_aspect_critic_prompt.prompt_input	Object	The full input sent to the LLM judge, including `user_input`, `response`, `retrieved_contexts`, `reference_contexts`, and `reference` (all context/reference fields are null for summarization)
prompts.single_turn_aspect_critic_prompt.prompt_output	Object	The LLM judge output with `reason` (string explaining the assessment) and `verdict` (integer)
prompts.single_turn_aspect_critic_prompt.edited_output	Object or null	Human-edited correction with `reason` and `verdict`, or null if no correction was needed
is_accepted	Boolean	Whether the annotated sample was accepted as a valid training or evaluation example

Usage Examples

Loading the Data

import json

with open("docs/_static/sample_annotated_summary.json") as f:
    data = json.load(f)

# Access all summary accuracy samples
summary_samples = data["summary_accuracy"]

# Separate accepted and rejected samples
accepted = [s for s in summary_samples if s["is_accepted"]]
rejected = [s for s in summary_samples if not s["is_accepted"]]

# Count pass/fail verdicts
passing = [s for s in summary_samples if s["metric_output"] == 1]
failing = [s for s in summary_samples if s["metric_output"] == 0]

print(f"Total samples: {len(summary_samples)}")
print(f"Accepted: {len(accepted)}, Rejected: {len(rejected)}")
print(f"Passing (accurate): {len(passing)}, Failing (inaccurate): {len(failing)}")

Examining Failed Summaries

import json

with open("docs/_static/sample_annotated_summary.json") as f:
    data = json.load(f)

# Find summaries judged as inaccurate
for sample in data["summary_accuracy"]:
    if sample["metric_output"] == 0:
        reason = sample["prompts"]["single_turn_aspect_critic_prompt"]["prompt_output"]["reason"]
        source_text = sample["metric_input"]["user_input"].replace("summarise given text\n", "")
        summary = sample["metric_input"]["response"]
        print(f"Source: {source_text[:80]}...")
        print(f"Summary: {summary[:80]}...")
        print(f"Failure reason: {reason}")
        print()

Related Pages

Environment:Vibrantlabsai_Ragas_Python_3_9_Core_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment