Implementation:Unslothai Unsloth Evaluate OCR Model

Knowledge Sources	Unsloth jiwer
Domains	Evaluation, Vision
Last Updated	2026-02-07 00:00 GMT

Overview

Concrete tool for evaluating vision-language models on OCR tasks with WER/CER metrics provided by the Unsloth test utilities.

Description

evaluate_ocr_model is a convenience wrapper around the OCRModelEvaluator class. It processes a dataset of images with ground-truth text, generates predictions using the VLM, and computes Word Error Rate (WER) and Character Error Rate (CER) using the jiwer library. Results include per-sample scores and aggregate metrics.

Usage

Call after VLM fine-tuning or model merging to validate OCR quality. Requires jiwer and qwen_vl_utils packages.

Code Reference

Source Location

Repository: unsloth
File: tests/utils/ocr_eval.py
Lines: L362-369 (evaluate_ocr_model convenience wrapper), L1-360 (OCRModelEvaluator class)

Signature

def evaluate_ocr_model(
    model,
    processor,
    dataset,
    output_dir = "ocr_evaluation_results",
    **kwargs,
) -> Tuple[Optional[float], Optional[float]]:
    """
    Convenience function for OCR model evaluation.

    Args:
        model: Vision-language model instance.
        processor: AutoProcessor for image/text preprocessing.
        dataset (List[Dict]): List of dicts with image and ground-truth text.
        output_dir (str): Directory for evaluation result files.
        **kwargs: Additional args passed to OCRModelEvaluator.evaluate_model.

    Returns:
        Tuple of (WER, CER) scores. None if evaluation fails.
    """

Import

from tests.utils.ocr_eval import evaluate_ocr_model

I/O Contract

Inputs

Name	Type	Required	Description
model	PreTrainedModel	Yes	Vision-language model instance
processor	AutoProcessor	Yes	VLM processor for image/text preprocessing
dataset	List[Dict]	Yes	OCR evaluation dataset with images and ground-truth text
output_dir	str	No	Directory for results (default: "ocr_evaluation_results")
max_new_tokens	int	No	Max generation tokens (default: 1024)
temperature	float	No	Sampling temperature (default: 1.5)

Outputs

Name	Type	Description
wer	float	Word Error Rate (lower is better)
cer	float	Character Error Rate (lower is better)

Usage Examples

Evaluate Merged VLM

from tests.utils.ocr_eval import evaluate_ocr_model

# After VLM fine-tuning and merging
wer, cer = evaluate_ocr_model(
    model=model,
    processor=processor,
    dataset=ocr_test_data,
    output_dir="./ocr_results",
)

print(f"WER: {wer:.4f}, CER: {cer:.4f}")

Related Pages

Implements Principle

Principle:Unslothai_Unsloth_OCR_Evaluation

Requires Environment

Environment:Unslothai_Unsloth_CUDA_BitsAndBytes

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment