Implementation:Vibrantlabsai Ragas QuotedSpans
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Metrics |
| Last Updated | 2026-02-12 00:00 GMT |
Overview
The quoted spans alignment metric measures citation accuracy by computing the fraction of quoted text spans in model-generated answers that can be found verbatim in the provided source passages.
Description
This module provides a functional (non-class-based) metric for evaluating citation alignment in model-generated answers. The core idea is that when a model quotes text within quotation marks, those quoted spans should be traceable back to the source documents.
The algorithm works as follows:
- Extract Quoted Spans -- A regular expression (
_QUOTE_RE) matches text enclosed in straight quotes ("), curly quotes, or other common quotation mark characters. Spans shorter than a configurable minimum word count (min_len, default 3 words) are discarded to avoid spurious matches.
- Normalize Text -- Both the quoted spans and source passages undergo light normalization: whitespace is collapsed and text is lowercased (when casefold is True, the default).
- Substring Matching -- For each answer, all source passages are joined into a single string. Each extracted quoted span is then checked for substring membership in the normalized source text.
- Compute Score -- The final score is the fraction of matched spans over the total number of extracted spans:
matched / total. If no quoted spans are found across the entire dataset, the score defaults to 0.0.
The function processes batches of answers and their corresponding source lists, making it suitable for evaluation pipelines.
Usage
Use this metric when evaluating models that produce answers with direct quotations or citations from source documents. It is particularly useful for retrieval-augmented generation (RAG) systems where the model is expected to cite passages verbatim. A high score indicates that the model accurately quotes from its sources, while a low score suggests hallucinated or inaccurate citations.
Code Reference
Source Location
- Repository: Vibrantlabsai_Ragas
- File: src/ragas/metrics/quoted_spans.py
Signature
def quoted_spans_alignment(
answers: Sequence[str],
sources: Sequence[Sequence[str]],
*,
casefold: bool = True,
min_len: int = 3,
) -> Dict[str, float]:
Import
from ragas.metrics.quoted_spans import quoted_spans_alignment
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| answers | Sequence[str] | Yes | List of model-generated answers (length N) potentially containing quoted spans |
| sources | Sequence[Sequence[str]] | Yes | List of lists (length N) of source passages corresponding to each answer |
| casefold | bool | No | Whether to normalize text by lowercasing before matching (default: True) |
| min_len | int | No | Minimum number of words in a quoted span for it to be considered (default: 3) |
Outputs
| Name | Type | Description |
|---|---|---|
| citation_alignment_quoted_spans | float | Fraction of quoted spans found verbatim in the sources (0.0 to 1.0) |
| matched | float | Number of quoted spans that were matched in the sources |
| total | float | Total number of quoted spans extracted from the answers |
Internal Helper Functions
| Function | Purpose |
|---|---|
_normalize(text) |
Collapses whitespace and lowercases text for consistent matching |
_extract_quoted_spans(answer, min_len=3) |
Extracts quoted text spans from an answer using regex, filtering by minimum word count |
Usage Examples
Basic Usage
from ragas.metrics.quoted_spans import quoted_spans_alignment
answers = [
'The report states "climate change is accelerating rapidly" according to the study.',
'The author noted "economic growth remained steady throughout the quarter" in the analysis.',
]
sources = [
["Climate change is accelerating rapidly, with temperatures rising each year."],
["Economic growth remained steady throughout the quarter, exceeding expectations."],
]
result = quoted_spans_alignment(answers, sources)
print(result)
# {
# "citation_alignment_quoted_spans": 1.0,
# "matched": 2.0,
# "total": 2.0,
# }
Case-Sensitive Matching
from ragas.metrics.quoted_spans import quoted_spans_alignment
answers = ['He said "The Quick Brown Fox jumped over the lazy dog" in his speech.']
sources = [["the quick brown fox jumped over the lazy dog"]]
# With casefold (default): matches
result_casefold = quoted_spans_alignment(answers, sources, casefold=True)
# citation_alignment_quoted_spans: 1.0
# Without casefold: may not match due to case differences
result_exact = quoted_spans_alignment(answers, sources, casefold=False)
# citation_alignment_quoted_spans: 0.0