Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Vibrantlabsai Ragas ChrfScore

From Leeroopedia
Knowledge Sources
Domains Evaluation, Metrics
Last Updated 2026-02-12 00:00 GMT

Overview

ChrfScore computes the chrF (character n-gram F-score) between a generated response and a reference answer using the sacrebleu library.

Description

The ChrfScore metric evaluates the quality of a generated response by computing its chrF score against a reference answer. chrF (character n-gram F-score) is a metric that measures the overlap of character-level n-grams between a hypothesis and a reference text. Unlike BLEU, which operates at the word level, chrF works at the character level, making it more robust to morphological variations, word order differences, and other surface-level text variations.

The implementation uses the sacrebleu library's corpus_chrf function. The raw sacrebleu chrF score (which ranges from 0 to 100) is normalized by dividing by 100 to produce a score between 0.0 and 1.0.

The metric includes robust input validation: it returns 0.0 if either the reference or response is None, not a string, or consists only of whitespace. This defensive handling ensures the metric does not raise exceptions on invalid inputs.

This metric does not require an LLM or embedding model -- it is a purely statistical character-level comparison metric. It only requires the sacrebleu package, which must be installed separately (pip install sacrebleu).

Additional keyword arguments can be passed through the kwargs dictionary to customize the underlying corpus_chrf function (e.g., character n-gram order, word n-gram order, beta parameter).

Usage

Use this metric when you want a character-level evaluation that is more forgiving of minor word-level differences than BLEU. chrF is particularly useful for morphologically rich languages, or when evaluating responses where the exact word forms may differ but the character-level content is similar. It serves as a complement or alternative to BleuScore.

Code Reference

Source Location

Signature

@dataclass
class ChrfScore(SingleTurnMetric):
    name: str = "chrf_score"
    kwargs: t.Dict[str, t.Any] = field(default_factory=dict)

Import

from ragas.metrics import ChrfScore

I/O Contract

Inputs

Name Type Required Description
reference str Yes The ground truth reference answer
response str Yes The generated response to evaluate
kwargs dict No Additional keyword arguments passed to sacrebleu's corpus_chrf function

Outputs

Name Type Description
score float chrF score normalized to the range 0.0 to 1.0 (returns 0.0 for invalid inputs)

Dependencies

This metric requires the sacrebleu package:

pip install sacrebleu

The dependency check is performed in __post_init__, and a descriptive ImportError is raised if the package is not available.

Internal Components

Input Validation

The metric performs thorough input validation before computing the score:

if reference is None or response is None:
    return 0.0
if not isinstance(reference, str) or not isinstance(response, str):
    return 0.0
if not reference.strip() or not response.strip():
    return 0.0

Score Computation

The sacrebleu corpus_chrf function expects a list of hypothesis strings and a list of lists of reference strings:

references = [[reference]]
hypotheses = [response]
score = self.corpus_chrf(hypotheses, references, **self.kwargs).score / 100

Usage Examples

Basic Usage

from ragas.metrics import ChrfScore
from ragas import evaluate
from datasets import Dataset

data = {
    "response": ["The cat sat on the mat."],
    "reference": ["The cat is sitting on the mat."],
}
dataset = Dataset.from_dict(data)

results = evaluate(dataset, metrics=[ChrfScore()])
print(results)

With Custom Parameters

from ragas.metrics import ChrfScore
from ragas.dataset_schema import SingleTurnSample

# Customize chrF behavior (e.g., include word n-grams for chrF++ variant)
chrf = ChrfScore(kwargs={"word_order": 2})

sample = SingleTurnSample(
    reference="The sun is powered by nuclear fusion.",
    response="Nuclear fusion powers the sun.",
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment