Implementation:Vibrantlabsai Ragas SimpleCriteriaScore
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Metrics |
| Last Updated | 2026-02-12 00:00 GMT |
Overview
SimpleCriteriaScore is a flexible, user-defined metric that evaluates LLM submissions against a custom criteria definition, supporting both single-turn and multi-turn interactions with optional majority-vote self-consistency.
Description
This metric allows users to define custom evaluation criteria as free-text definitions. The LLM is then prompted to evaluate the submission against these criteria, returning both a score and a reason for the judgment. The metric supports both SingleTurnMetric and MultiTurnMetric interfaces.
Key features:
- Custom criteria: Users provide a text definition of the evaluation criteria at construction time. The definition is injected into the prompt instruction as: "Evaluate the input based on the criteria defined. Criteria Definition: {definition}".
- Strictness via majority vote: The
strictnessparameter controls how many times the LLM evaluates the same input. Whenstrictness > 1, the metric runs multiple self-consistency checks and selects the final score via majority vote using Python'sCounter.most_common(). The strictness value is automatically adjusted to the next odd number to avoid ties. - Flexible inputs: For single-turn evaluation, all input fields (user_input, response, retrieved_contexts, reference, reference_contexts) are optional, allowing the metric to work with whatever data is available. For multi-turn evaluation, the conversation is serialized via
sample.pretty_repr(). - Custom prompts: Users can override the default
SingleTurnSimpleCriteriaPromptandMultiTurnSimpleCriteriaPromptwith custom prompts. - Discrete output: The default output type is
MetricOutputType.DISCRETE, producing integer scores.
The LLM returns a SimpleCriteriaOutput containing a reason (string) and a score (integer).
Usage
Use this metric when you need to evaluate LLM outputs against arbitrary, domain-specific criteria that are not covered by the built-in metrics. It is useful for custom quality checks, policy compliance evaluation, brand voice adherence, or any scenario where the evaluation criteria can be described in natural language.
Code Reference
Source Location
- Repository: Vibrantlabsai_Ragas
- File: src/ragas/metrics/_simple_criteria.py
Signature
class SimpleCriteriaScore(MetricWithLLM, SingleTurnMetric, MultiTurnMetric):
def __init__(
self,
name: str,
definition: str,
llm: t.Optional[BaseRagasLLM] = None,
required_columns: t.Optional[t.Dict[MetricType, t.Set[str]]] = None,
output_type: t.Optional[MetricOutputType] = MetricOutputType.DISCRETE,
single_turn_prompt: t.Optional[PydanticPrompt] = None,
multi_turn_prompt: t.Optional[PydanticPrompt] = None,
strictness: int = 1,
):
Import
from ragas.metrics import SimpleCriteriaScore
I/O Contract
Inputs (Single-Turn)
| Name | Type | Required | Description |
|---|---|---|---|
| user_input | str | No | The input to the LLM system |
| response | str | No | The response from the LLM system |
| retrieved_contexts | list[str] | No | The retrieved contexts from the LLM system |
| reference | str | No | The reference answer for evaluation |
| reference_contexts | list[str] | No | The reference contexts for the evaluation |
Inputs (Multi-Turn)
| Name | Type | Required | Description |
|---|---|---|---|
| user_input | str | No | The serialized multi-turn interaction (produced via pretty_repr()) |
| reference | str | No | The reference response for evaluation |
Configuration
| Name | Type | Default | Description |
|---|---|---|---|
| name | str | (required) | The name for this metric instance |
| definition | str | (required) | The evaluation criteria definition in natural language |
| strictness | int | 1 | Number of self-consistency checks; automatically adjusted to next odd number if even |
| single_turn_prompt | PydanticPrompt | SingleTurnSimpleCriteriaPrompt() | Custom prompt for single-turn evaluation |
| multi_turn_prompt | PydanticPrompt | MultiTurnSimpleCriteriaPrompt() | Custom prompt for multi-turn evaluation |
| output_type | MetricOutputType | DISCRETE | The output type (discrete by default, producing integer scores) |
Outputs
| Name | Type | Description |
|---|---|---|
| score | float | An integer score determined by the LLM based on the criteria definition; when strictness > 1, the majority-vote result is returned |
Usage Examples
Basic Usage
from ragas.metrics import SimpleCriteriaScore
from ragas.dataset_schema import SingleTurnSample
# Define a custom metric for evaluating politeness
politeness_metric = SimpleCriteriaScore(
name="politeness",
definition="Score 1 if the response is polite and professional, 0 otherwise.",
)
# politeness_metric.llm = your_llm
sample = SingleTurnSample(
user_input="How do I reset my password?",
response="Sure! You can reset your password by clicking 'Forgot Password' on the login page.",
)
# score = await politeness_metric.single_turn_ascore(sample)
With Majority-Vote Strictness
from ragas.metrics import SimpleCriteriaScore
# Use strictness=3 for more reliable evaluation via majority vote
metric = SimpleCriteriaScore(
name="accuracy_check",
definition="Score 1 if the response is factually accurate, 0 otherwise.",
strictness=3, # Will run 3 evaluations and take majority vote
)
# metric.llm = your_llm
Multi-Turn Evaluation
from ragas.metrics import SimpleCriteriaScore
from ragas.dataset_schema import MultiTurnSample
metric = SimpleCriteriaScore(
name="conversation_quality",
definition="Score 1 if the assistant maintained context and coherence throughout the conversation, 0 otherwise.",
)
# metric.llm = your_llm
# sample = MultiTurnSample(...)
# score = await metric.multi_turn_ascore(sample)