Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Vibrantlabsai Ragas SimpleCriteriaScore

From Leeroopedia
Knowledge Sources
Domains Evaluation, Metrics
Last Updated 2026-02-12 00:00 GMT

Overview

SimpleCriteriaScore is a flexible, user-defined metric that evaluates LLM submissions against a custom criteria definition, supporting both single-turn and multi-turn interactions with optional majority-vote self-consistency.

Description

This metric allows users to define custom evaluation criteria as free-text definitions. The LLM is then prompted to evaluate the submission against these criteria, returning both a score and a reason for the judgment. The metric supports both SingleTurnMetric and MultiTurnMetric interfaces.

Key features:

  • Custom criteria: Users provide a text definition of the evaluation criteria at construction time. The definition is injected into the prompt instruction as: "Evaluate the input based on the criteria defined. Criteria Definition: {definition}".
  • Strictness via majority vote: The strictness parameter controls how many times the LLM evaluates the same input. When strictness > 1, the metric runs multiple self-consistency checks and selects the final score via majority vote using Python's Counter.most_common(). The strictness value is automatically adjusted to the next odd number to avoid ties.
  • Flexible inputs: For single-turn evaluation, all input fields (user_input, response, retrieved_contexts, reference, reference_contexts) are optional, allowing the metric to work with whatever data is available. For multi-turn evaluation, the conversation is serialized via sample.pretty_repr().
  • Custom prompts: Users can override the default SingleTurnSimpleCriteriaPrompt and MultiTurnSimpleCriteriaPrompt with custom prompts.
  • Discrete output: The default output type is MetricOutputType.DISCRETE, producing integer scores.

The LLM returns a SimpleCriteriaOutput containing a reason (string) and a score (integer).

Usage

Use this metric when you need to evaluate LLM outputs against arbitrary, domain-specific criteria that are not covered by the built-in metrics. It is useful for custom quality checks, policy compliance evaluation, brand voice adherence, or any scenario where the evaluation criteria can be described in natural language.

Code Reference

Source Location

Signature

class SimpleCriteriaScore(MetricWithLLM, SingleTurnMetric, MultiTurnMetric):
    def __init__(
        self,
        name: str,
        definition: str,
        llm: t.Optional[BaseRagasLLM] = None,
        required_columns: t.Optional[t.Dict[MetricType, t.Set[str]]] = None,
        output_type: t.Optional[MetricOutputType] = MetricOutputType.DISCRETE,
        single_turn_prompt: t.Optional[PydanticPrompt] = None,
        multi_turn_prompt: t.Optional[PydanticPrompt] = None,
        strictness: int = 1,
    ):

Import

from ragas.metrics import SimpleCriteriaScore

I/O Contract

Inputs (Single-Turn)

Name Type Required Description
user_input str No The input to the LLM system
response str No The response from the LLM system
retrieved_contexts list[str] No The retrieved contexts from the LLM system
reference str No The reference answer for evaluation
reference_contexts list[str] No The reference contexts for the evaluation

Inputs (Multi-Turn)

Name Type Required Description
user_input str No The serialized multi-turn interaction (produced via pretty_repr())
reference str No The reference response for evaluation

Configuration

Name Type Default Description
name str (required) The name for this metric instance
definition str (required) The evaluation criteria definition in natural language
strictness int 1 Number of self-consistency checks; automatically adjusted to next odd number if even
single_turn_prompt PydanticPrompt SingleTurnSimpleCriteriaPrompt() Custom prompt for single-turn evaluation
multi_turn_prompt PydanticPrompt MultiTurnSimpleCriteriaPrompt() Custom prompt for multi-turn evaluation
output_type MetricOutputType DISCRETE The output type (discrete by default, producing integer scores)

Outputs

Name Type Description
score float An integer score determined by the LLM based on the criteria definition; when strictness > 1, the majority-vote result is returned

Usage Examples

Basic Usage

from ragas.metrics import SimpleCriteriaScore
from ragas.dataset_schema import SingleTurnSample

# Define a custom metric for evaluating politeness
politeness_metric = SimpleCriteriaScore(
    name="politeness",
    definition="Score 1 if the response is polite and professional, 0 otherwise.",
)
# politeness_metric.llm = your_llm

sample = SingleTurnSample(
    user_input="How do I reset my password?",
    response="Sure! You can reset your password by clicking 'Forgot Password' on the login page.",
)

# score = await politeness_metric.single_turn_ascore(sample)

With Majority-Vote Strictness

from ragas.metrics import SimpleCriteriaScore

# Use strictness=3 for more reliable evaluation via majority vote
metric = SimpleCriteriaScore(
    name="accuracy_check",
    definition="Score 1 if the response is factually accurate, 0 otherwise.",
    strictness=3,  # Will run 3 evaluations and take majority vote
)
# metric.llm = your_llm

Multi-Turn Evaluation

from ragas.metrics import SimpleCriteriaScore
from ragas.dataset_schema import MultiTurnSample

metric = SimpleCriteriaScore(
    name="conversation_quality",
    definition="Score 1 if the assistant maintained context and coherence throughout the conversation, 0 otherwise.",
)
# metric.llm = your_llm

# sample = MultiTurnSample(...)
# score = await metric.multi_turn_ascore(sample)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment