Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Open compass VLMEvalKit CCOCR Doc Parsing Evaluator

From Leeroopedia
Field Value
source VLMEvalKit
domain Vision, Evaluation, OCR, Document Parsing, Table Recognition

Overview

Evaluates document parsing quality using tree edit distance (APTED) for table structure recognition and LaTeX cleanup for document content in the CCOCR benchmark.

Description

This module implements `TableTree` and `CustomConfig` classes (adapted from IBM's work) for computing tree edit distance between predicted and ground-truth HTML table structures using the APTED algorithm. It includes LaTeX document preprocessing via regex pattern removal, NLTK-based tokenization, and TEDS (Tree Edit Distance based Similarity) scoring. The evaluator handles both table structure evaluation and general document parsing metrics.

Usage

Called internally by the corresponding dataset class during evaluation.

Code Reference

  • Source: vlmeval/dataset/utils/ccocr_evaluator/doc_parsing_evaluator.py, Lines: L1-256
  • Import: from vlmeval.dataset.utils.ccocr_evaluator.doc_parsing_evaluator import TableTree, CustomConfig

Key Functions:

class TableTree(Tree): ...
class CustomConfig(Config):
    def rename(self, node1, node2): ...

I/O Contract

Direction Description
Inputs Predicted and ground-truth HTML table strings or LaTeX document content
Outputs TEDS similarity score (0-1) for table structures; text-level metrics for document content

Usage Examples

from vlmeval.dataset.utils.ccocr_evaluator.doc_parsing_evaluator import TableTree

tree = TableTree(tag="table")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment