Implementation:Vibrantlabsai Ragas SummarizationScore

Knowledge Sources	Vibrantlabsai_Ragas
Domains	Evaluation, Metrics
Last Updated	2026-02-12 00:00 GMT

Overview

SummarizationScore is an LLM-based metric that evaluates the quality of a generated summary by measuring how well it preserves key information from the source text and how concise it is.

Description

The SummarizationScore metric implements a multi-step, LLM-driven evaluation pipeline for assessing summarization quality. The algorithm operates through three sequential LLM-prompting stages:

Keyphrase Extraction -- The ExtractKeyphrasePrompt identifies key entities from the source text, including persons, organizations, locations, dates/times, monetary values, and percentages.

Question Generation -- The GenerateQuestionsPrompt creates closed-ended (yes/no) questions based on the extracted keyphrases and source text. These questions are designed so that the answer is always "1" (yes) when evaluated against the original source.

Answer Generation -- The GenerateAnswersPrompt evaluates whether the generated summary contains enough information to answer each question, producing "1" or "0" for each.

The final score is a weighted combination of two components:

QA Score -- The fraction of questions that can be answered from the summary (correct_answers / total_questions). This measures information retention.

Conciseness Score -- Computed as 1 - min(len(summary), len(text)) / (len(text) + 1e-10), which penalizes summaries that are as long as or longer than the source text. This is optionally applied when length_penalty is enabled (default: True).

The combined score is: qa_score * (1 - coeff) + conciseness_score * coeff where coeff defaults to 0.5.

Usage

Use SummarizationScore to evaluate whether LLM-generated summaries capture the essential facts from the original text. This metric is particularly useful for assessing abstractive or extractive summarization pipelines. It requires an LLM to be configured (via the MetricWithLLM mixin) and expects reference_contexts (the source documents) and response (the summary) in the input sample.

Code Reference

Source Location

Repository: Vibrantlabsai_Ragas
File: src/ragas/metrics/_summarization.py

Signature

@dataclass
class SummarizationScore(MetricWithLLM, SingleTurnMetric):
    name: str = "summary_score"
    max_retries: int = 1
    length_penalty: bool = True
    _required_columns: t.Dict[MetricType, t.Set[str]] = field(
        default_factory=lambda: {
            MetricType.SINGLE_TURN: {
                "reference_contexts",
                "response",
            }
        }
    )
    output_type: t.Optional[MetricOutputType] = MetricOutputType.CONTINUOUS
    coeff: float = 0.5
    question_generation_prompt: PydanticPrompt = field(
        default_factory=GenerateQuestionsPrompt
    )
    answer_generation_prompt: PydanticPrompt = field(
        default_factory=GenerateAnswersPrompt
    )
    extract_keyphrases_prompt: PydanticPrompt = field(
        default_factory=ExtractKeyphrasePrompt
    )

Import

from ragas.metrics._summarization import SummarizationScore, summarization_score

I/O Contract

Inputs

Name	Type	Required	Description
reference_contexts	List[str]	Yes	The source text passages to be summarized (joined with newlines internally)
response	str	Yes	The generated summary to evaluate
length_penalty	bool	No	Whether to apply a conciseness penalty (default: True)
coeff	float	No	Weight for the conciseness score in the final combination (default: 0.5)
max_retries	int	No	Maximum number of retries for LLM calls (default: 1)

Outputs

Name	Type	Description
score	float	A continuous score between 0.0 and 1.0 representing summarization quality. Higher scores indicate better information retention and conciseness.

Internal Prompts

The metric uses three Pydantic-based prompt classes:

Prompt Class	Purpose	Input	Output
ExtractKeyphrasePrompt	Extracts key entities from source text	StringIO(text)	ExtractedKeyphrases(keyphrases)
GenerateQuestionsPrompt	Generates yes/no questions from text and keyphrases	GenerateQuestionsPromptInput(text, keyphrases)	QuestionsGenerated(questions)
GenerateAnswersPrompt	Evaluates whether summary can answer questions	SummaryAndQuestions(summary, questions)	AnswersGenerated(answers)

Usage Examples

Basic Usage

from ragas.metrics._summarization import SummarizationScore
from ragas.dataset_schema import SingleTurnSample

# Initialize the metric (requires LLM to be set)
metric = SummarizationScore()
# metric.llm = your_llm_instance

sample = SingleTurnSample(
    reference_contexts=[
        "Apple Inc. is a technology company based in Cupertino, California. "
        "Founded by Steve Jobs in 1976, it reached a market capitalization "
        "of $3 trillion in 2023."
    ],
    response="Apple Inc., founded by Steve Jobs in 1976, is a Cupertino-based "
             "tech company valued at $3 trillion as of 2023."
)

score = await metric._single_turn_ascore(sample, callbacks=None)
print(f"Summarization score: {score}")

Using the Pre-instantiated Instance

from ragas.metrics._summarization import summarization_score

# summarization_score is a pre-instantiated SummarizationScore()
# Set the LLM before using
# summarization_score.llm = your_llm_instance

Disabling Length Penalty

from ragas.metrics._summarization import SummarizationScore

# Only evaluate information retention without conciseness penalty
metric = SummarizationScore(length_penalty=False)

Related Pages

Environment:Vibrantlabsai_Ragas_Python_3_9_Core_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment