Implementation:Vibrantlabsai Ragas ChrfScoreV2
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Metrics |
| Last Updated | 2026-02-12 00:00 GMT |
Overview
Calculates the CHRF (Character F-score) between a reference and response text, providing a character n-gram based evaluation metric that correlates well with human judgments for text quality.
Description
The CHRFScore metric computes the Character F-score (CHRF) between a reference text and a response text. Unlike BLEU which operates on word-level n-grams, CHRF operates on character-level n-grams, making it more robust to morphological variations and better suited for morphologically rich languages.
The implementation delegates to the sacrebleu library's corpus_chrf function for consistent and reproducible scoring. The raw sacrebleu score (on a 0-100 scale) is normalized to the 0.0 to 1.0 range by dividing by 100.
The metric handles several edge cases gracefully:
- If either input is not a string, it returns 0.0 with an explanatory reason.
- If either input is empty or contains only whitespace, it returns 0.0 with an explanatory reason.
Additional sacrebleu parameters (such as char_order, word_order, beta, eps_smoothing) can be passed via the kwargs constructor parameter for fine-grained control over the scoring behavior.
This metric does not require an LLM or embedding model, making it fast and deterministic.
Usage
Use CHRFScore when you need a character-level evaluation metric for comparing generated text against reference text. It is particularly useful for machine translation evaluation, text summarization, and any scenario where morphological variation matters. It provides a more fine-grained comparison than word-level metrics like BLEU.
This is the V2 collections version, providing automatic validation and a consistent async API via the BaseMetric base class.
Code Reference
Source Location
- Repository: Vibrantlabsai_Ragas
- File: src/ragas/metrics/collections/chrf_score/metric.py
Signature
class CHRFScore(BaseMetric):
def __init__(
self,
name: str = "chrf_score",
kwargs: t.Optional[t.Dict[str, t.Any]] = None,
**base_kwargs,
): ...
async def ascore(self, reference: str, response: str) -> MetricResult: ...
Import
from ragas.metrics.collections import CHRFScore
I/O Contract
Constructor Parameters
| Name | Type | Required | Description |
|---|---|---|---|
| name | str | No | Metric name (default: "chrf_score") |
| kwargs | Dict[str, Any] or None | No | Additional arguments passed to sacrebleu.corpus_chrf (e.g., char_order, word_order, beta, eps_smoothing) |
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| reference | str | Yes | The reference/ground truth text |
| response | str | Yes | The response/hypothesis text to evaluate |
Outputs
| Name | Type | Description |
|---|---|---|
| score | MetricResult (float value) | CHRF score between 0.0 and 1.0. Higher indicates greater character n-gram overlap. May include a reason string if input validation fails |
Usage Examples
Basic Usage
from ragas.metrics.collections import CHRFScore
metric = CHRFScore()
result = await metric.ascore(
reference="The capital of France is Paris.",
response="Paris is the capital of France."
)
print(f"CHRF Score: {result.value}")
With Custom sacrebleu Parameters
from ragas.metrics.collections import CHRFScore
# Customize character n-gram order and word order
metric = CHRFScore(kwargs={"char_order": 6, "word_order": 2, "beta": 2})
result = await metric.ascore(
reference="Albert Einstein was born in 1879.",
response="Einstein was born in the year 1879."
)
print(f"CHRF Score: {result.value}")
Batch Scoring
from ragas.metrics.collections import CHRFScore
metric = CHRFScore()
results = await metric.abatch_score([
{"reference": "The cat sat on the mat.", "response": "A cat was sitting on a mat."},
{"reference": "Hello world.", "response": "Hi world."},
])
for r in results:
print(f"Score: {r.value}")