Implementation:Vibrantlabsai Ragas NoiseSensitivityV2

Knowledge Sources	Vibrantlabsai_Ragas
Domains	Evaluation, Metrics
Last Updated	2026-02-12 00:00 GMT

Overview

Measures how often an LLM system makes errors by providing incorrect responses when utilizing either relevant or irrelevant retrieved documents, using statement decomposition and natural language inference.

Description

The NoiseSensitivity metric (V2 collections implementation) evaluates how susceptible an LLM system is to producing incorrect answers when given noisy or irrelevant context. The metric operates in two modes: relevant (default) and irrelevant, each measuring a different aspect of noise sensitivity.

The evaluation follows a multi-step process:

Step 1 - Statement Decomposition: Both the reference (ground truth) and the response are decomposed into atomic statements using the StatementGeneratorPrompt. The LLM breaks each text into individual factual claims.

Step 2 - Faithfulness Evaluation: Each atomic statement from both the reference and the response is evaluated against each retrieved context using natural language inference (NLI) via the StatementFaithfulnessPrompt. This produces a verdict (1 for faithful, 0 for not faithful) for each statement-context pair.

Step 3 - Matrix Construction: The results are organized into boolean matrices:

retrieved2ground_truth: Which ground truth statements are supported by each retrieved context
retrieved2answer: Which answer statements are supported by each retrieved context
ground_truth2answer: Which answer statements are supported by the ground truth reference

Step 4 - Score Computation: The final score depends on the mode:

Relevant mode: Measures incorrect claims that come from relevant retrieved contexts. A retrieved context is considered "relevant" if it supports at least one ground truth statement. The score is the mean of (relevant_faithful AND incorrect).
Irrelevant mode: Measures incorrect claims that come from irrelevant retrieved contexts. Irrelevant contexts are those that do not support any ground truth statement. The score is the mean of (irrelevant_faithful AND NOT relevant_faithful AND incorrect).

A lower score is better for both modes, as a high score indicates the system is making more errors from noisy contexts.

Usage

Use this metric to evaluate the robustness of a RAG system against noisy or irrelevant retrieved contexts. In relevant mode, it measures how often the system generates incorrect statements from relevant contexts (perhaps due to misinterpretation). In irrelevant mode, it measures how often the system is misled by irrelevant contexts into generating incorrect statements.

This is the V2 collections version which uses modern instructor LLMs with structured output for statement decomposition and NLI evaluation, replacing the legacy V1 implementation.

Code Reference

Source Location

Repository: Vibrantlabsai_Ragas
File: src/ragas/metrics/collections/noise_sensitivity/metric.py

Signature

class NoiseSensitivity(BaseMetric):
    def __init__(
        self,
        llm: "InstructorBaseRagasLLM",
        name: str = "noise_sensitivity",
        mode: Literal["relevant", "irrelevant"] = "relevant",
        **kwargs,
    ): ...

    async def ascore(
        self,
        user_input: str,
        response: str,
        reference: str,
        retrieved_contexts: List[str],
    ) -> MetricResult: ...

Import

from ragas.metrics.collections import NoiseSensitivity

I/O Contract

Constructor Parameters

Name	Type	Required	Description
llm	InstructorBaseRagasLLM	Yes	Modern instructor-based LLM used for statement generation and NLI evaluation
name	str	No	Metric name (default: "noise_sensitivity")
mode	Literal["relevant", "irrelevant"]	No	Evaluation mode (default: "relevant"). "relevant" measures errors from relevant contexts; "irrelevant" measures errors from irrelevant contexts

Inputs

Name	Type	Required	Description
user_input	str	Yes	The original question posed by the user. Must be non-empty
response	str	Yes	The generated response to evaluate. Must be non-empty
reference	str	Yes	The ground truth reference answer. Must be non-empty
retrieved_contexts	List[str]	Yes	List of retrieved context strings used to generate the response. Must be non-empty

Outputs

Name	Type	Description
score	MetricResult (float value)	Noise sensitivity score between 0.0 and 1.0. Lower is better. Indicates the proportion of incorrect answer statements attributable to noisy contexts

Usage Examples

Basic Usage (Relevant Mode)

from openai import AsyncOpenAI
from ragas.llms.base import llm_factory
from ragas.metrics.collections import NoiseSensitivity

# Setup dependencies
client = AsyncOpenAI()
llm = llm_factory("openai", client=client, model="gpt-4o-mini")

# Create metric instance (default: relevant mode)
metric = NoiseSensitivity(llm=llm)

# Single evaluation
result = await metric.ascore(
    user_input="What is LIC known for?",
    response="LIC is the largest insurance company in India, known for its wide range of policies.",
    reference="LIC is known for managing large-scale investments and providing insurance.",
    retrieved_contexts=[
        "LIC was established in 1956 by the Government of India.",
        "LIC offers a variety of insurance products including life, health, and pension plans.",
        "The stock market in India is regulated by SEBI.",
    ]
)
print(f"Noise Sensitivity (relevant): {result.value}")

Irrelevant Mode

from ragas.metrics.collections import NoiseSensitivity

# Measure sensitivity to irrelevant contexts
metric = NoiseSensitivity(llm=llm, mode="irrelevant")

result = await metric.ascore(
    user_input="What is LIC known for?",
    response="LIC is the largest insurance company in India, also involved in stock trading.",
    reference="LIC is known for managing large-scale investments and providing insurance.",
    retrieved_contexts=[
        "LIC was established in 1956 by the Government of India.",
        "LIC offers a variety of insurance products.",
        "The stock market in India is regulated by SEBI.",
    ]
)
print(f"Noise Sensitivity (irrelevant): {result.value}")

Related Pages

Environment:Vibrantlabsai_Ragas_Python_3_9_Core_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment