Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Vibrantlabsai Ragas NoiseSensitivity

From Leeroopedia
Revision as of 11:55, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Vibrantlabsai_Ragas_NoiseSensitivity.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Evaluation, Metrics
Last Updated 2026-02-12 00:00 GMT

Overview

NoiseSensitivity is a metric that measures how susceptible an LLM system is to noise in retrieved contexts, detecting whether incorrect answer statements originate from relevant or irrelevant retrieved passages.

Description

This metric quantifies the degree to which noisy or misleading information in retrieved contexts causes the LLM to produce incorrect responses. It operates in two modes: relevant (default) and irrelevant, each measuring a different aspect of noise sensitivity.

The algorithm works through the following steps:

  1. Statement Decomposition: Both the reference answer and the generated response are decomposed into individual statements using a StatementGeneratorPrompt (reused from the Faithfulness metric).
  2. Faithfulness Evaluation: For each retrieved context, the metric uses an NLIStatementPrompt to determine which statements from both the reference and the response are supported by that context, producing binary verdict arrays.
  3. Cross-reference Matrix Construction: Three boolean matrices are built:
    • retrieved2ground_truth - which reference statements are supported by each retrieved context
    • retrieved2answer - which response statements are supported by each retrieved context
    • ground_truth2answer - which response statements are supported by the reference answer
  4. Score Computation: Incorrect statements (those not supported by the reference) are identified. Then, depending on the mode:
    • Relevant mode: Computes the proportion of incorrect statements that are faithful to relevant retrieved contexts (contexts that support at least one ground truth statement).
    • Irrelevant mode: Computes the proportion of incorrect statements that are faithful to irrelevant retrieved contexts (contexts that do not support any ground truth statement), excluding those also explained by relevant contexts.

A lower score indicates better performance, meaning the system is less sensitive to noise.

Usage

Use this metric when you want to measure how robust a RAG system is against noisy retrieval. The "relevant" mode identifies cases where relevant contexts mislead the model into producing incorrect answers, while the "irrelevant" mode identifies cases where irrelevant contexts introduce errors. This requires a reference answer for comparison.

Code Reference

Source Location

Signature

@dataclass
class NoiseSensitivity(MetricWithLLM, SingleTurnMetric):
    name: str = "noise_sensitivity"
    mode: t.Literal["relevant", "irrelevant"] = "relevant"
    _required_columns: t.Dict[MetricType, t.Set[str]] = field(
        default_factory=lambda: {
            MetricType.SINGLE_TURN: {
                "user_input",
                "response",
                "reference",
                "retrieved_contexts",
            }
        }
    )
    output_type: t.Optional[MetricOutputType] = MetricOutputType.CONTINUOUS
    nli_statements_prompt: PydanticPrompt = field(default_factory=NLIStatementPrompt)
    statement_generator_prompt: PydanticPrompt = field(
        default_factory=StatementGeneratorPrompt
    )
    max_retries: int = 1

Import

from ragas.metrics import NoiseSensitivity

I/O Contract

Inputs

Name Type Required Description
user_input str Yes The original user query or question
response str Yes The AI-generated response to evaluate
reference str Yes The ground truth reference answer
retrieved_contexts list[str] Yes The list of retrieved contexts (may contain both relevant and irrelevant passages)

Configuration

Name Type Default Description
mode Literal["relevant", "irrelevant"] "relevant" Whether to measure noise sensitivity from relevant or irrelevant contexts
max_retries int 1 Maximum number of retries for LLM calls
nli_statements_prompt PydanticPrompt NLIStatementPrompt() The prompt used for natural language inference evaluation
statement_generator_prompt PydanticPrompt StatementGeneratorPrompt() The prompt used for decomposing text into statements

Outputs

Name Type Description
score float A value between 0.0 and 1.0 representing the proportion of incorrect statements attributable to noise; lower is better

Usage Examples

Basic Usage (Relevant Mode)

from ragas.metrics import NoiseSensitivity
from ragas.dataset_schema import SingleTurnSample

metric = NoiseSensitivity(mode="relevant")
# metric.llm = your_llm

sample = SingleTurnSample(
    user_input="What is the capital of France?",
    response="The capital of France is Paris, and it was founded in 1000 BC.",
    reference="The capital of France is Paris.",
    retrieved_contexts=[
        "Paris is the capital and largest city of France.",
        "France is known for its wine and cheese production.",
    ],
)

# score = await metric.single_turn_ascore(sample)

Irrelevant Mode

from ragas.metrics import NoiseSensitivity

# Measure sensitivity to irrelevant context noise
metric = NoiseSensitivity(mode="irrelevant")
# metric.llm = your_llm

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment