Implementation:Explodinggradients Ragas SummarizationScore Metric

Field	Value
source	Repo
domains	Metrics, Evaluation
last_updated	2026-02-10

Overview

SummarizationScore evaluates the quality of text summaries by extracting keyphrases from the source, generating closed-ended questions, and scoring how well the summary answers those questions.

Description

The SummarizationScore class uses a three-step LLM pipeline to evaluate summaries:

Keyphrase Extraction -- Extracts named entities and keyphrases (persons, organizations, locations, dates, monetary values, percentages) from the source text using the ExtractKeyphrasePrompt.
Question Generation -- Generates closed-ended (yes/no) questions based on the source text and extracted keyphrases using the GenerateQuestionsPrompt.
Answer Generation -- Evaluates whether the summary contains sufficient information to answer each question using the GenerateAnswersPrompt.

The final score combines a QA score (ratio of answerable questions) with an optional conciseness penalty based on summary length relative to the source text. It inherits from MetricWithLLM and SingleTurnMetric.

Key attributes:

length_penalty -- Whether to apply a conciseness penalty (default True).
coeff -- Weight for the conciseness score in the final combination (default 0.5).
extract_keyphrases_prompt -- Prompt for keyphrase extraction.
question_generation_prompt -- Prompt for question generation.
answer_generation_prompt -- Prompt for answer evaluation.

Usage

The metric requires reference_contexts (the source text as a list of context strings) and response (the summary). An LLM must be configured.

Code Reference

Property	Value
Source Location	`src/ragas/metrics/_summarization.py` L143-241
Class Signature	`class SummarizationScore(MetricWithLLM, SingleTurnMetric)`
Import	`from ragas.metrics import SummarizationScore`

I/O Contract

Inputs

Parameter	Type	Required	Description
reference_contexts	List[str]	Yes	The source text passages to be summarized
response	str	Yes	The generated summary to evaluate

Outputs

Output	Type	Description
score	float	Combined QA score and optional conciseness score (0.0 to 1.0)

Usage Examples

from ragas.metrics import SummarizationScore
from ragas.dataset_schema import SingleTurnSample

metric = SummarizationScore(length_penalty=True, coeff=0.5)
# metric.llm = ...  # Set your LLM

sample = SingleTurnSample(
    reference_contexts=[
        "Apple Inc. is a technology company based in Cupertino, California. Founded by Steve Jobs in 1976, it reached a market capitalization of $3 trillion in 2023."
    ],
    response="Apple Inc., founded in 1976, is a major tech company based in California."
)
# score = await metric.single_turn_ascore(sample)

A pre-configured instance is available:

from ragas.metrics._summarization import summarization_score

Related Pages

Explodinggradients_Ragas_Faithfulness_Metric -- Statement-level faithfulness evaluation
Explodinggradients_Ragas_ContextRecall_Metric -- Context recall based on statement attribution
Explodinggradients_Ragas_FactualCorrectness_Metric -- Claim decomposition-based factual evaluation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment