Implementation:Explodinggradients Ragas Collections RougeScore Metric

Field	Value
source	Repo
domains	Metrics, NLP
last_updated	2026-02-10 00:00 GMT

Overview

RougeScore is a v2 class-based metric that calculates the ROUGE (Recall-Oriented Understudy for Gisting Evaluation) score between reference and response texts using the rouge_score library.

Description

RougeScore extends BaseMetric to provide an async-first ROUGE score implementation. ROUGE is a recall-oriented metric family widely used in summarization and text generation evaluation. This metric supports two ROUGE variants -- rouge1 (unigram overlap) and rougeL (longest common subsequence) -- and three scoring modes: fmeasure, precision, and recall. It uses the rouge_score.rouge_scorer.RougeScorer with stemming enabled. No LLM or embedding components are required.

Usage

Instantiate with optional rouge_type and mode parameters. Call ascore(reference, response) for single evaluations or abatch_score(inputs) for batch evaluations. Requires the rouge_score package.

Code Reference

Property	Value
Source Location	`src/ragas/metrics/collections/_rouge_score.py` L1--86
Signature	`class RougeScore(BaseMetric)`
Import	`from ragas.metrics.collections import RougeScore`

I/O Contract

Inputs

Parameter	Type	Required	Description
`reference`	`str`	Yes	The reference / ground truth text
`response`	`str`	Yes	The response text to evaluate

Constructor Parameters

Parameter	Type	Default	Description
`name`	`str`	`"rouge_score"`	Metric name
`rouge_type`	`Literal["rouge1", "rougeL"]`	`"rougeL"`	ROUGE variant to compute
`mode`	`Literal["fmeasure", "precision", "recall"]`	`"fmeasure"`	Scoring mode

Outputs

Field	Type	Description
`MetricResult.value`	`float`	ROUGE score in range 0.0--1.0

Usage Examples

from ragas.metrics.collections import RougeScore

# Default: rougeL with fmeasure
metric = RougeScore()
result = await metric.ascore(
    reference="The capital of France is Paris.",
    response="Paris is the capital of France."
)
print(f"ROUGE-L F1: {result.value}")

# ROUGE-1 with recall mode
metric_r1 = RougeScore(rouge_type="rouge1", mode="recall")
result = await metric_r1.ascore(
    reference="The quick brown fox jumps over the lazy dog.",
    response="A quick brown fox jumped over the lazy dog."
)
print(f"ROUGE-1 Recall: {result.value}")

# Batch evaluation
results = await metric.abatch_score([
    {"reference": "Text one.", "response": "Response one."},
    {"reference": "Text two.", "response": "Response two."},
])

Related Pages

Explodinggradients_Ragas_Collections_BaseMetric_Class -- Base class for all v2 metrics
Explodinggradients_Ragas_Collections_BleuScore_Metric -- BLEU score metric (precision-based counterpart)
Explodinggradients_Ragas_Collections_CHRFScore_Metric -- CHRF character n-gram metric
Explodinggradients_Ragas_Collections_StringMetrics_Module -- Other non-LLM string comparison metrics

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment