Implementation:EvolvingLMMs Lab Lmms eval RefCOCO Plus Utils Rec

Task utility functions for the RefCOCO+ Recognition variant, which evaluates bounding box coordinate prediction from natural language descriptions.

Location

/tmp/kapso_repo_sslb_59s/lmms_eval/tasks/refcoco+/utils_rec.py

Overview

Provides dataset preprocessing (bbox normalization, answer explosion), visual processing, result parsing, and geometric metric computation (IoU, accuracy thresholds, center accuracy) for RefCOCO+ recognition tasks.

Metrics

COCO_REC_METRICS - List of recognition metrics:

IoU - Intersection over Union
ACC@0.1, ACC@0.3, ACC@0.5, ACC@0.7, ACC@0.9 - Accuracy at IoU thresholds
Center_ACC - Center point containment accuracy

Dataset Preprocessing

refcoco_bbox_rec_preprocess_dataset(dataset): Prepares dataset by normalizing bboxes and exploding multi-answer rows; Parameters: dataset - HuggingFace Dataset object

Process:

Add Image Dimensions:
- Maps dataset to add image_width and image_height from PIL images
Normalize Bounding Boxes:
- Original format: (top_x, top_y, width, height)
- Converts to: (x1_norm, y1_norm, x2_norm, y2_norm)
- Normalizes by dividing by image dimensions (values 0-1)
Explode Answers:
- Each row has answer as list of strings
- Creates separate row for each answer
- Duplicates other columns
- Converts to new Dataset object

Returns: Preprocessed Dataset with one row per answer

Side Effect: Prints row count change

Document Processing

refcoco_bbox_rec_doc_to_visual(doc): Returns image without visual modifications; Parameters: doc - Document with image key; Returns: List with single RGB-converted image

refcoco_bbox_rec_doc_to_text(doc)

Constructs bbox prediction prompt with description

Parameters: doc - Document with answer (string, post-explosion)

Process:

Asserts answer is string
Constructs prompt with format explanation and description

Returns: Prompt string

Prompt Format:

"Bounding box coordinates are specified in the format (top-left x, top-left y, bottom-right x, bottom-right y).
All values are floating point numbers bounded between 0 and 1.
Please provide the bounding box coordinate of the region this sentence describes: {answer}"

Result Parsing

parse_float_sequence_within(input_str): Extracts first sequence of four floats within square brackets; Parameters: input_str - Model response string

Regex Pattern:

\[\s*(-?\d+(?:\.\d+)?),\s*(-?\d+(?:\.\d+)?),\s*(-?\d+(?:\.\d+)?),\s*(-?\d+(?:\.\d+)?)\s*\]

Process:

Searches for pattern in input string
If found: extracts four floats as list
If not found: returns [0, 0, 0, 0]

Returns: List of four floats

Result Processing

refcoco_bbox_rec_process_result(doc, result)

Packages parsed prediction with ground truth bbox

Parameters:

doc - Document with answer, question_id, bbox (normalized)
result - Model prediction list

Process:

Extracts prediction string
Parses to float sequence
Gets annotation ID
Creates data dict

Returns: Dictionary with entries for each COCO_REC_METRICS, each containing:

answer: Description string
pred: Predicted bbox [x1, y1, x2, y2]
ann_id: Annotation ID
bbox: Ground truth normalized bbox

Geometric Metrics

IoU Computation

compute_iou(box1, box2)

Computes Intersection over Union of two bounding boxes

Parameters:

box1: List [x_min, y_min, x_max, y_max]
box2: List [x_min, y_min, x_max, y_max]

Process:

Determines intersection rectangle coordinates
Computes intersection area (0 if no overlap)
Computes areas of both boxes
Computes union area: area1 + area2 - intersection
Returns: intersection / union

Returns: IoU score (float, 0-1)

Accuracy at Threshold

compute_accuracy(box1, box2, threshold=0.5)

Binary accuracy based on IoU threshold

Parameters:

box1, box2: Bounding boxes
threshold: IoU threshold (default 0.5)

Process: Computes IoU and checks if ≥ threshold

Returns: Boolean (True if IoU ≥ threshold)

Center Accuracy

compute_center_accuracy(box1, box2): Checks if center of box2 is within box1; Parameters: box1, box2 - Bounding boxes

Process:

Computes center of box2:
- center_x = (box2[0] + box2[2]) / 2
- center_y = (box2[1] + box2[3]) / 2
Checks if center is within box1 bounds

Returns: Boolean (True if center within box1)

Aggregation

Core Aggregation Function

refcoco_bbox_rec_aggregation_result(results, metric)

Aggregates results using specified geometric metric

Parameters:

results - List of result dicts
metric - Metric name (from COCO_REC_METRICS)

Scorer Dictionary:

{
  "IoU": compute_iou,
  "ACC@0.1": lambda x, y: compute_accuracy(x, y, 0.1),
  "ACC@0.3": lambda x, y: compute_accuracy(x, y, 0.3),
  "ACC@0.5": lambda x, y: compute_accuracy(x, y, 0.5),
  "ACC@0.7": lambda x, y: compute_accuracy(x, y, 0.7),
  "ACC@0.9": lambda x, y: compute_accuracy(x, y, 0.9),
  "Center_ACC": compute_center_accuracy
}

Process:

For each result:
- Extract ground truth bbox
- Extract predicted bbox
- Compute metric score
- Append to results list
Compute mean across all results
Print aggregated score

Returns: Mean metric value (float)

Metric-Specific Functions

Each metric has a dedicated aggregation function:

refcoco_bbox_rec_iou(results): Returns: Mean IoU

refcoco_bbox_rec_acc01(results): Returns: Accuracy at 0.1 IoU threshold

refcoco_bbox_rec_acc03(results): Returns: Accuracy at 0.3 IoU threshold

refcoco_bbox_rec_acc05(results): Returns: Accuracy at 0.5 IoU threshold

refcoco_bbox_rec_acc07(results): Returns: Accuracy at 0.7 IoU threshold

refcoco_bbox_rec_acc09(results): Returns: Accuracy at 0.9 IoU threshold

refcoco_bbox_rec_center_acc(results): Returns: Center point accuracy

All functions call refcoco_bbox_rec_aggregation_result(results, metric_name).

Coordinate Format

Input Format (Original):

(top_x, top_y, width, height)
Absolute pixel coordinates

Normalized Format (Used):

(x1_norm, y1_norm, x2_norm, y2_norm)
Values between 0 and 1
Normalized by image dimensions

Expected Model Output:

[x1, y1, x2, y2] format
Within square brackets
Floating point numbers
Values should be 0-1 range

Dependencies

logging - Logger setup
re - Regex for parsing
datasets.Dataset - Dataset manipulation

Evaluation Workflow

Preprocessing:
- Normalize bboxes to 0-1 range
- Explode multi-answer rows
Document Processing:
- Visual: Present unmodified image
- Text: Description with format instructions
Generation: Model predicts bbox coordinates
Result Processing:
- Parse bbox from response
- Package with ground truth
Aggregation:
- Compute geometric metrics
- Average across dataset

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment

Location

Overview

Metrics

Dataset Preprocessing

Document Processing

Result Parsing

Result Processing

Geometric Metrics

IoU Computation

Accuracy at Threshold

Center Accuracy

Aggregation

Core Aggregation Function

Metric-Specific Functions

Coordinate Format

Dependencies

Evaluation Workflow

Related

Page Connections