Implementation:Unslothai Unsloth Evaluate OCR Model
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Vision |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for evaluating vision-language models on OCR tasks with WER/CER metrics provided by the Unsloth test utilities.
Description
evaluate_ocr_model is a convenience wrapper around the OCRModelEvaluator class. It processes a dataset of images with ground-truth text, generates predictions using the VLM, and computes Word Error Rate (WER) and Character Error Rate (CER) using the jiwer library. Results include per-sample scores and aggregate metrics.
Usage
Call after VLM fine-tuning or model merging to validate OCR quality. Requires jiwer and qwen_vl_utils packages.
Code Reference
Source Location
- Repository: unsloth
- File: tests/utils/ocr_eval.py
- Lines: L362-369 (evaluate_ocr_model convenience wrapper), L1-360 (OCRModelEvaluator class)
Signature
def evaluate_ocr_model(
model,
processor,
dataset,
output_dir = "ocr_evaluation_results",
**kwargs,
) -> Tuple[Optional[float], Optional[float]]:
"""
Convenience function for OCR model evaluation.
Args:
model: Vision-language model instance.
processor: AutoProcessor for image/text preprocessing.
dataset (List[Dict]): List of dicts with image and ground-truth text.
output_dir (str): Directory for evaluation result files.
**kwargs: Additional args passed to OCRModelEvaluator.evaluate_model.
Returns:
Tuple of (WER, CER) scores. None if evaluation fails.
"""
Import
from tests.utils.ocr_eval import evaluate_ocr_model
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | PreTrainedModel | Yes | Vision-language model instance |
| processor | AutoProcessor | Yes | VLM processor for image/text preprocessing |
| dataset | List[Dict] | Yes | OCR evaluation dataset with images and ground-truth text |
| output_dir | str | No | Directory for results (default: "ocr_evaluation_results") |
| max_new_tokens | int | No | Max generation tokens (default: 1024) |
| temperature | float | No | Sampling temperature (default: 1.5) |
Outputs
| Name | Type | Description |
|---|---|---|
| wer | float | Word Error Rate (lower is better) |
| cer | float | Character Error Rate (lower is better) |
Usage Examples
Evaluate Merged VLM
from tests.utils.ocr_eval import evaluate_ocr_model
# After VLM fine-tuning and merging
wer, cer = evaluate_ocr_model(
model=model,
processor=processor,
dataset=ocr_test_data,
output_dir="./ocr_results",
)
print(f"WER: {wer:.4f}, CER: {cer:.4f}")