Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Vibrantlabsai Ragas RougeScoreV2

From Leeroopedia
Knowledge Sources
Domains Evaluation, Metrics
Last Updated 2026-02-12 00:00 GMT

Overview

RougeScore is a class-based v2 metric that calculates ROUGE (Recall-Oriented Understudy for Gisting Evaluation) scores between reference and response texts using the rouge_score library, with configurable ROUGE type and scoring mode.

Description

The RougeScore metric provides a modern, class-based implementation of ROUGE scoring for evaluating text summarization and generation quality. It inherits from BaseMetric and does not require any LLM or embedding model.

The metric supports two ROUGE types:

  • rouge1 -- Measures unigram (single word) overlap between the reference and response. This captures word-level content similarity.
  • rougeL -- Measures the Longest Common Subsequence (LCS) between the reference and response. This captures sentence-level structure similarity and is the default.

The metric also supports three scoring modes:

  • fmeasure (default) -- The harmonic mean of precision and recall, providing a balanced measure.
  • precision -- The fraction of response n-grams or subsequences that appear in the reference.
  • recall -- The fraction of reference n-grams or subsequences that appear in the response.

Internally, the metric uses Google's rouge_score library with stemming enabled (use_stemmer=True) to normalize word forms before comparison.

The metric returns a MetricResult object with the score as a float between 0.0 and 1.0.

Usage

Use RougeScore as a standard text overlap metric for evaluating summarization or text generation quality. It is particularly appropriate for measuring recall-oriented content overlap. The rouge_score library must be installed separately (pip install rouge_score). Choose rouge1 for word-level overlap and rougeL for structure-preserving overlap.

Code Reference

Source Location

Signature

class RougeScore(BaseMetric):
    def __init__(
        self,
        name: str = "rouge_score",
        rouge_type: t.Literal["rouge1", "rougeL"] = "rougeL",
        mode: t.Literal["fmeasure", "precision", "recall"] = "fmeasure",
        **kwargs,
    ):

Import

from ragas.metrics.collections import RougeScore

I/O Contract

Inputs

Name Type Required Description
reference str Yes The reference/ground truth text
response str Yes The response text to evaluate against the reference
rouge_type Literal["rouge1", "rougeL"] No ROUGE variant to use (default: "rougeL")
mode Literal["fmeasure", "precision", "recall"] No Scoring mode (default: "fmeasure")

Outputs

Name Type Description
result MetricResult A MetricResult object with a value attribute containing the ROUGE score between 0.0 and 1.0

Usage Examples

Basic Usage

from ragas.metrics.collections import RougeScore

metric = RougeScore()

result = await metric.ascore(
    reference="The capital of France is Paris.",
    response="Paris is the capital of France."
)
print(f"ROUGE-L F-measure: {result.value}")

Using ROUGE-1 with Recall Mode

from ragas.metrics.collections import RougeScore

metric = RougeScore(rouge_type="rouge1", mode="recall")

result = await metric.ascore(
    reference="The quick brown fox jumps over the lazy dog.",
    response="A quick brown fox leaps over the lazy dog."
)
print(f"ROUGE-1 Recall: {result.value}")

Batch Evaluation

from ragas.metrics.collections import RougeScore

metric = RougeScore(rouge_type="rougeL", mode="fmeasure")

results = await metric.abatch_score([
    {"reference": "The cat sat on the mat.", "response": "A cat was on a mat."},
    {"reference": "It is raining outside.", "response": "It is raining outside."},
])

for i, result in enumerate(results):
    print(f"Sample {i}: ROUGE-L = {result.value}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment