Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Unslothai Unsloth Evaluate OCR Model

From Leeroopedia
Revision as of 17:02, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Unslothai_Unsloth_Evaluate_OCR_Model.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Evaluation, Vision
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for evaluating vision-language models on OCR tasks with WER/CER metrics provided by the Unsloth test utilities.

Description

evaluate_ocr_model is a convenience wrapper around the OCRModelEvaluator class. It processes a dataset of images with ground-truth text, generates predictions using the VLM, and computes Word Error Rate (WER) and Character Error Rate (CER) using the jiwer library. Results include per-sample scores and aggregate metrics.

Usage

Call after VLM fine-tuning or model merging to validate OCR quality. Requires jiwer and qwen_vl_utils packages.

Code Reference

Source Location

  • Repository: unsloth
  • File: tests/utils/ocr_eval.py
  • Lines: L362-369 (evaluate_ocr_model convenience wrapper), L1-360 (OCRModelEvaluator class)

Signature

def evaluate_ocr_model(
    model,
    processor,
    dataset,
    output_dir = "ocr_evaluation_results",
    **kwargs,
) -> Tuple[Optional[float], Optional[float]]:
    """
    Convenience function for OCR model evaluation.

    Args:
        model: Vision-language model instance.
        processor: AutoProcessor for image/text preprocessing.
        dataset (List[Dict]): List of dicts with image and ground-truth text.
        output_dir (str): Directory for evaluation result files.
        **kwargs: Additional args passed to OCRModelEvaluator.evaluate_model.

    Returns:
        Tuple of (WER, CER) scores. None if evaluation fails.
    """

Import

from tests.utils.ocr_eval import evaluate_ocr_model

I/O Contract

Inputs

Name Type Required Description
model PreTrainedModel Yes Vision-language model instance
processor AutoProcessor Yes VLM processor for image/text preprocessing
dataset List[Dict] Yes OCR evaluation dataset with images and ground-truth text
output_dir str No Directory for results (default: "ocr_evaluation_results")
max_new_tokens int No Max generation tokens (default: 1024)
temperature float No Sampling temperature (default: 1.5)

Outputs

Name Type Description
wer float Word Error Rate (lower is better)
cer float Character Error Rate (lower is better)

Usage Examples

Evaluate Merged VLM

from tests.utils.ocr_eval import evaluate_ocr_model

# After VLM fine-tuning and merging
wer, cer = evaluate_ocr_model(
    model=model,
    processor=processor,
    dataset=ocr_test_data,
    output_dir="./ocr_results",
)

print(f"WER: {wer:.4f}, CER: {cer:.4f}")

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment