Implementation:Open compass VLMEvalKit CCOCR Doc Parsing Evaluator
| Field | Value |
|---|---|
| source | VLMEvalKit |
| domain | Vision, Evaluation, OCR, Document Parsing, Table Recognition |
Overview
Evaluates document parsing quality using tree edit distance (APTED) for table structure recognition and LaTeX cleanup for document content in the CCOCR benchmark.
Description
This module implements `TableTree` and `CustomConfig` classes (adapted from IBM's work) for computing tree edit distance between predicted and ground-truth HTML table structures using the APTED algorithm. It includes LaTeX document preprocessing via regex pattern removal, NLTK-based tokenization, and TEDS (Tree Edit Distance based Similarity) scoring. The evaluator handles both table structure evaluation and general document parsing metrics.
Usage
Called internally by the corresponding dataset class during evaluation.
Code Reference
- Source:
vlmeval/dataset/utils/ccocr_evaluator/doc_parsing_evaluator.py, Lines: L1-256 - Import:
from vlmeval.dataset.utils.ccocr_evaluator.doc_parsing_evaluator import TableTree, CustomConfig
Key Functions:
class TableTree(Tree): ...
class CustomConfig(Config):
def rename(self, node1, node2): ...
I/O Contract
| Direction | Description |
|---|---|
| Inputs | Predicted and ground-truth HTML table strings or LaTeX document content |
| Outputs | TEDS similarity score (0-1) for table structures; text-level metrics for document content |
Usage Examples
from vlmeval.dataset.utils.ccocr_evaluator.doc_parsing_evaluator import TableTree
tree = TableTree(tag="table")