Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Explodinggradients Ragas Collections RougeScore Metric

From Leeroopedia


Field Value
source Repo
domains Metrics, NLP
last_updated 2026-02-10 00:00 GMT

Overview

RougeScore is a v2 class-based metric that calculates the ROUGE (Recall-Oriented Understudy for Gisting Evaluation) score between reference and response texts using the rouge_score library.

Description

RougeScore extends BaseMetric to provide an async-first ROUGE score implementation. ROUGE is a recall-oriented metric family widely used in summarization and text generation evaluation. This metric supports two ROUGE variants -- rouge1 (unigram overlap) and rougeL (longest common subsequence) -- and three scoring modes: fmeasure, precision, and recall. It uses the rouge_score.rouge_scorer.RougeScorer with stemming enabled. No LLM or embedding components are required.

Usage

Instantiate with optional rouge_type and mode parameters. Call ascore(reference, response) for single evaluations or abatch_score(inputs) for batch evaluations. Requires the rouge_score package.

Code Reference

Property Value
Source Location src/ragas/metrics/collections/_rouge_score.py L1--86
Signature class RougeScore(BaseMetric)
Import from ragas.metrics.collections import RougeScore

I/O Contract

Inputs

Parameter Type Required Description
reference str Yes The reference / ground truth text
response str Yes The response text to evaluate

Constructor Parameters

Parameter Type Default Description
name str "rouge_score" Metric name
rouge_type Literal["rouge1", "rougeL"] "rougeL" ROUGE variant to compute
mode Literal["fmeasure", "precision", "recall"] "fmeasure" Scoring mode

Outputs

Field Type Description
MetricResult.value float ROUGE score in range 0.0--1.0

Usage Examples

from ragas.metrics.collections import RougeScore

# Default: rougeL with fmeasure
metric = RougeScore()
result = await metric.ascore(
    reference="The capital of France is Paris.",
    response="Paris is the capital of France."
)
print(f"ROUGE-L F1: {result.value}")

# ROUGE-1 with recall mode
metric_r1 = RougeScore(rouge_type="rouge1", mode="recall")
result = await metric_r1.ascore(
    reference="The quick brown fox jumps over the lazy dog.",
    response="A quick brown fox jumped over the lazy dog."
)
print(f"ROUGE-1 Recall: {result.value}")

# Batch evaluation
results = await metric.abatch_score([
    {"reference": "Text one.", "response": "Response one."},
    {"reference": "Text two.", "response": "Response two."},
])

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment