Implementation:Vibrantlabsai Ragas SimpleCriteriaScore

Knowledge Sources	Vibrantlabsai_Ragas
Domains	Evaluation, Metrics
Last Updated	2026-02-12 00:00 GMT

Overview

SimpleCriteriaScore is a flexible, user-defined metric that evaluates LLM submissions against a custom criteria definition, supporting both single-turn and multi-turn interactions with optional majority-vote self-consistency.

Description

This metric allows users to define custom evaluation criteria as free-text definitions. The LLM is then prompted to evaluate the submission against these criteria, returning both a score and a reason for the judgment. The metric supports both SingleTurnMetric and MultiTurnMetric interfaces.

Key features:

Custom criteria: Users provide a text definition of the evaluation criteria at construction time. The definition is injected into the prompt instruction as: "Evaluate the input based on the criteria defined. Criteria Definition: {definition}".
Strictness via majority vote: The strictness parameter controls how many times the LLM evaluates the same input. When strictness > 1, the metric runs multiple self-consistency checks and selects the final score via majority vote using Python's Counter.most_common(). The strictness value is automatically adjusted to the next odd number to avoid ties.
Flexible inputs: For single-turn evaluation, all input fields (user_input, response, retrieved_contexts, reference, reference_contexts) are optional, allowing the metric to work with whatever data is available. For multi-turn evaluation, the conversation is serialized via sample.pretty_repr().
Custom prompts: Users can override the default SingleTurnSimpleCriteriaPrompt and MultiTurnSimpleCriteriaPrompt with custom prompts.
Discrete output: The default output type is MetricOutputType.DISCRETE, producing integer scores.

The LLM returns a SimpleCriteriaOutput containing a reason (string) and a score (integer).

Usage

Use this metric when you need to evaluate LLM outputs against arbitrary, domain-specific criteria that are not covered by the built-in metrics. It is useful for custom quality checks, policy compliance evaluation, brand voice adherence, or any scenario where the evaluation criteria can be described in natural language.

Code Reference

Source Location

Repository: Vibrantlabsai_Ragas
File: src/ragas/metrics/_simple_criteria.py

Signature

class SimpleCriteriaScore(MetricWithLLM, SingleTurnMetric, MultiTurnMetric):
    def __init__(
        self,
        name: str,
        definition: str,
        llm: t.Optional[BaseRagasLLM] = None,
        required_columns: t.Optional[t.Dict[MetricType, t.Set[str]]] = None,
        output_type: t.Optional[MetricOutputType] = MetricOutputType.DISCRETE,
        single_turn_prompt: t.Optional[PydanticPrompt] = None,
        multi_turn_prompt: t.Optional[PydanticPrompt] = None,
        strictness: int = 1,
    ):

Import

from ragas.metrics import SimpleCriteriaScore

I/O Contract

Inputs (Single-Turn)

Name	Type	Required	Description
user_input	str	No	The input to the LLM system
response	str	No	The response from the LLM system
retrieved_contexts	list[str]	No	The retrieved contexts from the LLM system
reference	str	No	The reference answer for evaluation
reference_contexts	list[str]	No	The reference contexts for the evaluation

Inputs (Multi-Turn)

Name	Type	Required	Description
user_input	str	No	The serialized multi-turn interaction (produced via pretty_repr())
reference	str	No	The reference response for evaluation

Configuration

Name	Type	Default	Description
name	str	(required)	The name for this metric instance
definition	str	(required)	The evaluation criteria definition in natural language
strictness	int	1	Number of self-consistency checks; automatically adjusted to next odd number if even
single_turn_prompt	PydanticPrompt	SingleTurnSimpleCriteriaPrompt()	Custom prompt for single-turn evaluation
multi_turn_prompt	PydanticPrompt	MultiTurnSimpleCriteriaPrompt()	Custom prompt for multi-turn evaluation
output_type	MetricOutputType	DISCRETE	The output type (discrete by default, producing integer scores)

Outputs

Name	Type	Description
score	float	An integer score determined by the LLM based on the criteria definition; when strictness > 1, the majority-vote result is returned

Usage Examples

Basic Usage

from ragas.metrics import SimpleCriteriaScore
from ragas.dataset_schema import SingleTurnSample

# Define a custom metric for evaluating politeness
politeness_metric = SimpleCriteriaScore(
    name="politeness",
    definition="Score 1 if the response is polite and professional, 0 otherwise.",
)
# politeness_metric.llm = your_llm

sample = SingleTurnSample(
    user_input="How do I reset my password?",
    response="Sure! You can reset your password by clicking 'Forgot Password' on the login page.",
)

# score = await politeness_metric.single_turn_ascore(sample)

With Majority-Vote Strictness

from ragas.metrics import SimpleCriteriaScore

# Use strictness=3 for more reliable evaluation via majority vote
metric = SimpleCriteriaScore(
    name="accuracy_check",
    definition="Score 1 if the response is factually accurate, 0 otherwise.",
    strictness=3,  # Will run 3 evaluations and take majority vote
)
# metric.llm = your_llm

Multi-Turn Evaluation

from ragas.metrics import SimpleCriteriaScore
from ragas.dataset_schema import MultiTurnSample

metric = SimpleCriteriaScore(
    name="conversation_quality",
    definition="Score 1 if the assistant maintained context and coherence throughout the conversation, 0 otherwise.",
)
# metric.llm = your_llm

# sample = MultiTurnSample(...)
# score = await metric.multi_turn_ascore(sample)

Related Pages

Environment:Vibrantlabsai_Ragas_Python_3_9_Core_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment