Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:EvolvingLMMs Lab Lmms eval RefCOCO Plus Utils Rec

From Leeroopedia

Task utility functions for the RefCOCO+ Recognition variant, which evaluates bounding box coordinate prediction from natural language descriptions.

Location

/tmp/kapso_repo_sslb_59s/lmms_eval/tasks/refcoco+/utils_rec.py

Overview

Provides dataset preprocessing (bbox normalization, answer explosion), visual processing, result parsing, and geometric metric computation (IoU, accuracy thresholds, center accuracy) for RefCOCO+ recognition tasks.

Metrics

COCO_REC_METRICS - List of recognition metrics:

  • IoU - Intersection over Union
  • ACC@0.1, ACC@0.3, ACC@0.5, ACC@0.7, ACC@0.9 - Accuracy at IoU thresholds
  • Center_ACC - Center point containment accuracy

Dataset Preprocessing

refcoco_bbox_rec_preprocess_dataset(dataset)
Prepares dataset by normalizing bboxes and exploding multi-answer rows
Parameters: dataset - HuggingFace Dataset object

Process:

  1. Add Image Dimensions:
    • Maps dataset to add image_width and image_height from PIL images
  2. Normalize Bounding Boxes:
    • Original format: (top_x, top_y, width, height)
    • Converts to: (x1_norm, y1_norm, x2_norm, y2_norm)
    • Normalizes by dividing by image dimensions (values 0-1)
  3. Explode Answers:
    • Each row has answer as list of strings
    • Creates separate row for each answer
    • Duplicates other columns
    • Converts to new Dataset object
Returns: Preprocessed Dataset with one row per answer
Side Effect: Prints row count change

Document Processing

refcoco_bbox_rec_doc_to_visual(doc)
Returns image without visual modifications
Parameters: doc - Document with image key
Returns: List with single RGB-converted image
refcoco_bbox_rec_doc_to_text(doc)
Constructs bbox prediction prompt with description
Parameters: doc - Document with answer (string, post-explosion)
Process:
  1. Asserts answer is string
  2. Constructs prompt with format explanation and description
Returns: Prompt string

Prompt Format:

"Bounding box coordinates are specified in the format (top-left x, top-left y, bottom-right x, bottom-right y).
All values are floating point numbers bounded between 0 and 1.
Please provide the bounding box coordinate of the region this sentence describes: {answer}"

Result Parsing

parse_float_sequence_within(input_str)
Extracts first sequence of four floats within square brackets
Parameters: input_str - Model response string

Regex Pattern:

\[\s*(-?\d+(?:\.\d+)?),\s*(-?\d+(?:\.\d+)?),\s*(-?\d+(?:\.\d+)?),\s*(-?\d+(?:\.\d+)?)\s*\]

Process:

  1. Searches for pattern in input string
  2. If found: extracts four floats as list
  3. If not found: returns [0, 0, 0, 0]
Returns: List of four floats

Result Processing

refcoco_bbox_rec_process_result(doc, result)
Packages parsed prediction with ground truth bbox
Parameters:
  • doc - Document with answer, question_id, bbox (normalized)
  • result - Model prediction list
Process:
  1. Extracts prediction string
  2. Parses to float sequence
  3. Gets annotation ID
  4. Creates data dict
Returns: Dictionary with entries for each COCO_REC_METRICS, each containing:
  • answer: Description string
  • pred: Predicted bbox [x1, y1, x2, y2]
  • ann_id: Annotation ID
  • bbox: Ground truth normalized bbox

Geometric Metrics

IoU Computation

compute_iou(box1, box2)
Computes Intersection over Union of two bounding boxes
Parameters:
  • box1: List [x_min, y_min, x_max, y_max]
  • box2: List [x_min, y_min, x_max, y_max]

Process:

  1. Determines intersection rectangle coordinates
  2. Computes intersection area (0 if no overlap)
  3. Computes areas of both boxes
  4. Computes union area: area1 + area2 - intersection
  5. Returns: intersection / union
Returns: IoU score (float, 0-1)

Accuracy at Threshold

compute_accuracy(box1, box2, threshold=0.5)
Binary accuracy based on IoU threshold
Parameters:
  • box1, box2: Bounding boxes
  • threshold: IoU threshold (default 0.5)
Process: Computes IoU and checks if ≥ threshold
Returns: Boolean (True if IoU ≥ threshold)

Center Accuracy

compute_center_accuracy(box1, box2)
Checks if center of box2 is within box1
Parameters: box1, box2 - Bounding boxes

Process:

  1. Computes center of box2:
    • center_x = (box2[0] + box2[2]) / 2
    • center_y = (box2[1] + box2[3]) / 2
  2. Checks if center is within box1 bounds
Returns: Boolean (True if center within box1)

Aggregation

Core Aggregation Function

refcoco_bbox_rec_aggregation_result(results, metric)
Aggregates results using specified geometric metric
Parameters:
  • results - List of result dicts
  • metric - Metric name (from COCO_REC_METRICS)

Scorer Dictionary:

{
  "IoU": compute_iou,
  "ACC@0.1": lambda x, y: compute_accuracy(x, y, 0.1),
  "ACC@0.3": lambda x, y: compute_accuracy(x, y, 0.3),
  "ACC@0.5": lambda x, y: compute_accuracy(x, y, 0.5),
  "ACC@0.7": lambda x, y: compute_accuracy(x, y, 0.7),
  "ACC@0.9": lambda x, y: compute_accuracy(x, y, 0.9),
  "Center_ACC": compute_center_accuracy
}

Process:

  1. For each result:
    • Extract ground truth bbox
    • Extract predicted bbox
    • Compute metric score
    • Append to results list
  2. Compute mean across all results
  3. Print aggregated score
Returns: Mean metric value (float)

Metric-Specific Functions

Each metric has a dedicated aggregation function:

refcoco_bbox_rec_iou(results)
Returns: Mean IoU
refcoco_bbox_rec_acc01(results)
Returns: Accuracy at 0.1 IoU threshold
refcoco_bbox_rec_acc03(results)
Returns: Accuracy at 0.3 IoU threshold
refcoco_bbox_rec_acc05(results)
Returns: Accuracy at 0.5 IoU threshold
refcoco_bbox_rec_acc07(results)
Returns: Accuracy at 0.7 IoU threshold
refcoco_bbox_rec_acc09(results)
Returns: Accuracy at 0.9 IoU threshold
refcoco_bbox_rec_center_acc(results)
Returns: Center point accuracy

All functions call refcoco_bbox_rec_aggregation_result(results, metric_name).

Coordinate Format

Input Format (Original):

  • (top_x, top_y, width, height)
  • Absolute pixel coordinates

Normalized Format (Used):

  • (x1_norm, y1_norm, x2_norm, y2_norm)
  • Values between 0 and 1
  • Normalized by image dimensions

Expected Model Output:

  • [x1, y1, x2, y2] format
  • Within square brackets
  • Floating point numbers
  • Values should be 0-1 range

Dependencies

  • logging - Logger setup
  • re - Regex for parsing
  • datasets.Dataset - Dataset manipulation

Evaluation Workflow

  1. Preprocessing:
    • Normalize bboxes to 0-1 range
    • Explode multi-answer rows
  2. Document Processing:
    • Visual: Present unmodified image
    • Text: Description with format instructions
  3. Generation: Model predicts bbox coordinates
  4. Result Processing:
    • Parse bbox from response
    • Package with ground truth
  5. Aggregation:
    • Compute geometric metrics
    • Average across dataset

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment