Implementation:EvolvingLMMs Lab Lmms eval RefCOCO Plus Utils Rec
Task utility functions for the RefCOCO+ Recognition variant, which evaluates bounding box coordinate prediction from natural language descriptions.
Location
/tmp/kapso_repo_sslb_59s/lmms_eval/tasks/refcoco+/utils_rec.py
Overview
Provides dataset preprocessing (bbox normalization, answer explosion), visual processing, result parsing, and geometric metric computation (IoU, accuracy thresholds, center accuracy) for RefCOCO+ recognition tasks.
Metrics
COCO_REC_METRICS - List of recognition metrics:
IoU- Intersection over UnionACC@0.1,ACC@0.3,ACC@0.5,ACC@0.7,ACC@0.9- Accuracy at IoU thresholdsCenter_ACC- Center point containment accuracy
Dataset Preprocessing
refcoco_bbox_rec_preprocess_dataset(dataset)- Prepares dataset by normalizing bboxes and exploding multi-answer rows
- Parameters:
dataset- HuggingFace Dataset object
Process:
- Add Image Dimensions:
- Maps dataset to add
image_widthandimage_heightfrom PIL images
- Maps dataset to add
- Normalize Bounding Boxes:
- Original format: (top_x, top_y, width, height)
- Converts to: (x1_norm, y1_norm, x2_norm, y2_norm)
- Normalizes by dividing by image dimensions (values 0-1)
- Explode Answers:
- Each row has
answeras list of strings - Creates separate row for each answer
- Duplicates other columns
- Converts to new Dataset object
- Each row has
- Returns: Preprocessed Dataset with one row per answer
- Side Effect: Prints row count change
Document Processing
refcoco_bbox_rec_doc_to_visual(doc)- Returns image without visual modifications
- Parameters:
doc- Document withimagekey - Returns: List with single RGB-converted image
refcoco_bbox_rec_doc_to_text(doc)- Constructs bbox prediction prompt with description
- Parameters:
doc- Document withanswer(string, post-explosion) - Process:
- Asserts answer is string
- Constructs prompt with format explanation and description
- Returns: Prompt string
Prompt Format:
"Bounding box coordinates are specified in the format (top-left x, top-left y, bottom-right x, bottom-right y).
All values are floating point numbers bounded between 0 and 1.
Please provide the bounding box coordinate of the region this sentence describes: {answer}"
Result Parsing
parse_float_sequence_within(input_str)- Extracts first sequence of four floats within square brackets
- Parameters:
input_str- Model response string
Regex Pattern:
\[\s*(-?\d+(?:\.\d+)?),\s*(-?\d+(?:\.\d+)?),\s*(-?\d+(?:\.\d+)?),\s*(-?\d+(?:\.\d+)?)\s*\]
Process:
- Searches for pattern in input string
- If found: extracts four floats as list
- If not found: returns
[0, 0, 0, 0]
- Returns: List of four floats
Result Processing
refcoco_bbox_rec_process_result(doc, result)- Packages parsed prediction with ground truth bbox
- Parameters:
doc- Document withanswer,question_id,bbox(normalized)result- Model prediction list
- Process:
- Extracts prediction string
- Parses to float sequence
- Gets annotation ID
- Creates data dict
- Returns: Dictionary with entries for each
COCO_REC_METRICS, each containing:answer: Description stringpred: Predicted bbox [x1, y1, x2, y2]ann_id: Annotation IDbbox: Ground truth normalized bbox
Geometric Metrics
IoU Computation
compute_iou(box1, box2)- Computes Intersection over Union of two bounding boxes
- Parameters:
box1: List [x_min, y_min, x_max, y_max]box2: List [x_min, y_min, x_max, y_max]
Process:
- Determines intersection rectangle coordinates
- Computes intersection area (0 if no overlap)
- Computes areas of both boxes
- Computes union area:
area1 + area2 - intersection - Returns:
intersection / union
- Returns: IoU score (float, 0-1)
Accuracy at Threshold
compute_accuracy(box1, box2, threshold=0.5)- Binary accuracy based on IoU threshold
- Parameters:
box1,box2: Bounding boxesthreshold: IoU threshold (default 0.5)
- Process: Computes IoU and checks if ≥ threshold
- Returns: Boolean (True if IoU ≥ threshold)
Center Accuracy
compute_center_accuracy(box1, box2)- Checks if center of box2 is within box1
- Parameters:
box1,box2- Bounding boxes
Process:
- Computes center of box2:
center_x = (box2[0] + box2[2]) / 2center_y = (box2[1] + box2[3]) / 2
- Checks if center is within box1 bounds
- Returns: Boolean (True if center within box1)
Aggregation
Core Aggregation Function
refcoco_bbox_rec_aggregation_result(results, metric)- Aggregates results using specified geometric metric
- Parameters:
results- List of result dictsmetric- Metric name (fromCOCO_REC_METRICS)
Scorer Dictionary:
{
"IoU": compute_iou,
"ACC@0.1": lambda x, y: compute_accuracy(x, y, 0.1),
"ACC@0.3": lambda x, y: compute_accuracy(x, y, 0.3),
"ACC@0.5": lambda x, y: compute_accuracy(x, y, 0.5),
"ACC@0.7": lambda x, y: compute_accuracy(x, y, 0.7),
"ACC@0.9": lambda x, y: compute_accuracy(x, y, 0.9),
"Center_ACC": compute_center_accuracy
}
Process:
- For each result:
- Extract ground truth bbox
- Extract predicted bbox
- Compute metric score
- Append to results list
- Compute mean across all results
- Print aggregated score
- Returns: Mean metric value (float)
Metric-Specific Functions
Each metric has a dedicated aggregation function:
refcoco_bbox_rec_iou(results)- Returns: Mean IoU
refcoco_bbox_rec_acc01(results)- Returns: Accuracy at 0.1 IoU threshold
refcoco_bbox_rec_acc03(results)- Returns: Accuracy at 0.3 IoU threshold
refcoco_bbox_rec_acc05(results)- Returns: Accuracy at 0.5 IoU threshold
refcoco_bbox_rec_acc07(results)- Returns: Accuracy at 0.7 IoU threshold
refcoco_bbox_rec_acc09(results)- Returns: Accuracy at 0.9 IoU threshold
refcoco_bbox_rec_center_acc(results)- Returns: Center point accuracy
All functions call refcoco_bbox_rec_aggregation_result(results, metric_name).
Coordinate Format
Input Format (Original):
- (top_x, top_y, width, height)
- Absolute pixel coordinates
Normalized Format (Used):
- (x1_norm, y1_norm, x2_norm, y2_norm)
- Values between 0 and 1
- Normalized by image dimensions
Expected Model Output:
[x1, y1, x2, y2]format- Within square brackets
- Floating point numbers
- Values should be 0-1 range
Dependencies
logging- Logger setupre- Regex for parsingdatasets.Dataset- Dataset manipulation
Evaluation Workflow
- Preprocessing:
- Normalize bboxes to 0-1 range
- Explode multi-answer rows
- Document Processing:
- Visual: Present unmodified image
- Text: Description with format instructions
- Generation: Model predicts bbox coordinates
- Result Processing:
- Parse bbox from response
- Package with ground truth
- Aggregation:
- Compute geometric metrics
- Average across dataset
Related
- Task_Utility_Functions - General task utility pattern
- RefCOCO_Plus_Utils - Related caption generation utilities
- Bounding_Box_Prediction - Bbox prediction task pattern
- IoU_Metrics - Intersection over Union details