Principle:Unslothai Unsloth OCR Evaluation
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Vision |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
An evaluation methodology that measures vision-language model quality on optical character recognition tasks using Word Error Rate and Character Error Rate metrics.
Description
OCR evaluation assesses how accurately a vision-language model can read and transcribe text from images. This is a critical benchmark for VLM fine-tuning, as it tests both visual perception (recognizing characters) and language generation (producing coherent text).
The evaluation uses two standard metrics:
- Word Error Rate (WER): Measures word-level transcription accuracy, computed as the edit distance between predicted and reference word sequences divided by the reference length.
- Character Error Rate (CER): Measures character-level accuracy, more granular than WER and robust to tokenization differences.
Usage
Use this principle to evaluate vision-language models after fine-tuning on OCR or document understanding tasks. Useful for validating that model merging and quantization preserve visual understanding quality.
Theoretical Basis
WER and CER are based on the Levenshtein (edit) distance:
Where S = substitutions, D = deletions, I = insertions, N = reference word count.
# Abstract OCR evaluation
for sample in dataset:
image = sample["image"]
ground_truth = sample["text"]
prediction = model.generate(image, prompt="Read the text in this image.")
wer_score = wer(ground_truth, prediction)
cer_score = cer(ground_truth, prediction)