Implementation:Vibrantlabsai Ragas RougeScoreV2

Knowledge Sources	Vibrantlabsai_Ragas
Domains	Evaluation, Metrics
Last Updated	2026-02-12 00:00 GMT

Overview

RougeScore is a class-based v2 metric that calculates ROUGE (Recall-Oriented Understudy for Gisting Evaluation) scores between reference and response texts using the rouge_score library, with configurable ROUGE type and scoring mode.

Description

The RougeScore metric provides a modern, class-based implementation of ROUGE scoring for evaluating text summarization and generation quality. It inherits from BaseMetric and does not require any LLM or embedding model.

The metric supports two ROUGE types:

rouge1 -- Measures unigram (single word) overlap between the reference and response. This captures word-level content similarity.
rougeL -- Measures the Longest Common Subsequence (LCS) between the reference and response. This captures sentence-level structure similarity and is the default.

The metric also supports three scoring modes:

fmeasure (default) -- The harmonic mean of precision and recall, providing a balanced measure.
precision -- The fraction of response n-grams or subsequences that appear in the reference.
recall -- The fraction of reference n-grams or subsequences that appear in the response.

Internally, the metric uses Google's rouge_score library with stemming enabled (use_stemmer=True) to normalize word forms before comparison.

The metric returns a MetricResult object with the score as a float between 0.0 and 1.0.

Usage

Use RougeScore as a standard text overlap metric for evaluating summarization or text generation quality. It is particularly appropriate for measuring recall-oriented content overlap. The rouge_score library must be installed separately (pip install rouge_score). Choose rouge1 for word-level overlap and rougeL for structure-preserving overlap.

Code Reference

Source Location

Repository: Vibrantlabsai_Ragas
File: src/ragas/metrics/collections/_rouge_score.py

Signature

class RougeScore(BaseMetric):
    def __init__(
        self,
        name: str = "rouge_score",
        rouge_type: t.Literal["rouge1", "rougeL"] = "rougeL",
        mode: t.Literal["fmeasure", "precision", "recall"] = "fmeasure",
        **kwargs,
    ):

Import

from ragas.metrics.collections import RougeScore

I/O Contract

Inputs

Name	Type	Required	Description
reference	str	Yes	The reference/ground truth text
response	str	Yes	The response text to evaluate against the reference
rouge_type	Literal["rouge1", "rougeL"]	No	ROUGE variant to use (default: "rougeL")
mode	Literal["fmeasure", "precision", "recall"]	No	Scoring mode (default: "fmeasure")

Outputs

Name	Type	Description
result	MetricResult	A MetricResult object with a `value` attribute containing the ROUGE score between 0.0 and 1.0

Usage Examples

Basic Usage

from ragas.metrics.collections import RougeScore

metric = RougeScore()

result = await metric.ascore(
    reference="The capital of France is Paris.",
    response="Paris is the capital of France."
)
print(f"ROUGE-L F-measure: {result.value}")

Using ROUGE-1 with Recall Mode

from ragas.metrics.collections import RougeScore

metric = RougeScore(rouge_type="rouge1", mode="recall")

result = await metric.ascore(
    reference="The quick brown fox jumps over the lazy dog.",
    response="A quick brown fox leaps over the lazy dog."
)
print(f"ROUGE-1 Recall: {result.value}")

Batch Evaluation

from ragas.metrics.collections import RougeScore

metric = RougeScore(rouge_type="rougeL", mode="fmeasure")

results = await metric.abatch_score([
    {"reference": "The cat sat on the mat.", "response": "A cat was on a mat."},
    {"reference": "It is raining outside.", "response": "It is raining outside."},
])

for i, result in enumerate(results):
    print(f"Sample {i}: ROUGE-L = {result.value}")

Related Pages

Environment:Vibrantlabsai_Ragas_Python_3_9_Core_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment