Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Vibrantlabsai Ragas SummarizationScore

From Leeroopedia
Knowledge Sources
Domains Evaluation, Metrics
Last Updated 2026-02-12 00:00 GMT

Overview

SummarizationScore is an LLM-based metric that evaluates the quality of a generated summary by measuring how well it preserves key information from the source text and how concise it is.

Description

The SummarizationScore metric implements a multi-step, LLM-driven evaluation pipeline for assessing summarization quality. The algorithm operates through three sequential LLM-prompting stages:

  1. Keyphrase Extraction -- The ExtractKeyphrasePrompt identifies key entities from the source text, including persons, organizations, locations, dates/times, monetary values, and percentages.
  1. Question Generation -- The GenerateQuestionsPrompt creates closed-ended (yes/no) questions based on the extracted keyphrases and source text. These questions are designed so that the answer is always "1" (yes) when evaluated against the original source.
  1. Answer Generation -- The GenerateAnswersPrompt evaluates whether the generated summary contains enough information to answer each question, producing "1" or "0" for each.

The final score is a weighted combination of two components:

  • QA Score -- The fraction of questions that can be answered from the summary (correct_answers / total_questions). This measures information retention.
  • Conciseness Score -- Computed as 1 - min(len(summary), len(text)) / (len(text) + 1e-10), which penalizes summaries that are as long as or longer than the source text. This is optionally applied when length_penalty is enabled (default: True).

The combined score is: qa_score * (1 - coeff) + conciseness_score * coeff where coeff defaults to 0.5.

Usage

Use SummarizationScore to evaluate whether LLM-generated summaries capture the essential facts from the original text. This metric is particularly useful for assessing abstractive or extractive summarization pipelines. It requires an LLM to be configured (via the MetricWithLLM mixin) and expects reference_contexts (the source documents) and response (the summary) in the input sample.

Code Reference

Source Location

Signature

@dataclass
class SummarizationScore(MetricWithLLM, SingleTurnMetric):
    name: str = "summary_score"
    max_retries: int = 1
    length_penalty: bool = True
    _required_columns: t.Dict[MetricType, t.Set[str]] = field(
        default_factory=lambda: {
            MetricType.SINGLE_TURN: {
                "reference_contexts",
                "response",
            }
        }
    )
    output_type: t.Optional[MetricOutputType] = MetricOutputType.CONTINUOUS
    coeff: float = 0.5
    question_generation_prompt: PydanticPrompt = field(
        default_factory=GenerateQuestionsPrompt
    )
    answer_generation_prompt: PydanticPrompt = field(
        default_factory=GenerateAnswersPrompt
    )
    extract_keyphrases_prompt: PydanticPrompt = field(
        default_factory=ExtractKeyphrasePrompt
    )

Import

from ragas.metrics._summarization import SummarizationScore, summarization_score

I/O Contract

Inputs

Name Type Required Description
reference_contexts List[str] Yes The source text passages to be summarized (joined with newlines internally)
response str Yes The generated summary to evaluate
length_penalty bool No Whether to apply a conciseness penalty (default: True)
coeff float No Weight for the conciseness score in the final combination (default: 0.5)
max_retries int No Maximum number of retries for LLM calls (default: 1)

Outputs

Name Type Description
score float A continuous score between 0.0 and 1.0 representing summarization quality. Higher scores indicate better information retention and conciseness.

Internal Prompts

The metric uses three Pydantic-based prompt classes:

Prompt Class Purpose Input Output
ExtractKeyphrasePrompt Extracts key entities from source text StringIO(text) ExtractedKeyphrases(keyphrases)
GenerateQuestionsPrompt Generates yes/no questions from text and keyphrases GenerateQuestionsPromptInput(text, keyphrases) QuestionsGenerated(questions)
GenerateAnswersPrompt Evaluates whether summary can answer questions SummaryAndQuestions(summary, questions) AnswersGenerated(answers)

Usage Examples

Basic Usage

from ragas.metrics._summarization import SummarizationScore
from ragas.dataset_schema import SingleTurnSample

# Initialize the metric (requires LLM to be set)
metric = SummarizationScore()
# metric.llm = your_llm_instance

sample = SingleTurnSample(
    reference_contexts=[
        "Apple Inc. is a technology company based in Cupertino, California. "
        "Founded by Steve Jobs in 1976, it reached a market capitalization "
        "of $3 trillion in 2023."
    ],
    response="Apple Inc., founded by Steve Jobs in 1976, is a Cupertino-based "
             "tech company valued at $3 trillion as of 2023."
)

score = await metric._single_turn_ascore(sample, callbacks=None)
print(f"Summarization score: {score}")

Using the Pre-instantiated Instance

from ragas.metrics._summarization import summarization_score

# summarization_score is a pre-instantiated SummarizationScore()
# Set the LLM before using
# summarization_score.llm = your_llm_instance

Disabling Length Penalty

from ragas.metrics._summarization import SummarizationScore

# Only evaluate information retention without conciseness penalty
metric = SummarizationScore(length_penalty=False)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment