Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:EvolvingLMMs Lab Lmms eval LongTimeScope Utils

From Leeroopedia

Location: /tmp/kapso_repo_sslb_59s/lmms_eval/tasks/longtimescope/utils.py

Principle: Task_Utility_Functions

Purpose

Task-specific utilities for LongTimeScope benchmark evaluating long video understanding across QA, OCR, and temporal reasoning tasks.

Constants

TASK_CATEGORIES = ["QA", "OCR", "temporal"]

Configuration

Reads cache directory from longtimescope.yaml
Base cache from HF_HOME environment variable (default: ~/.cache/huggingface/)
Handles video files with multiple extensions (mp4, MP4, mkv)

Key Functions

convert_time_to_frame

def convert_time_to_frame(time_in_seconds, fps)

Converts time in seconds to frame number given FPS rate.

timescope_doc_to_visual

def timescope_doc_to_visual(doc)

Locates video file path:

Constructs path from cache directory and doc["video"]
Tries multiple extensions (mp4 -> MP4 -> mkv)
Exits with error if video not found
Returns list with single video path

extract_characters_regex

def extract_characters_regex(s)

Extracts answer choice from response text:

Strips common answer prefixes ("The best answer is", etc.)
Returns empty string if response too long (>10 words) without [ABCDEF]
Uses regex to find first [ABCDEF] character
Returns matched character or empty string

timescope_process_results

def timescope_process_results(doc, results)

Processes single result:

Extracts predicted answer using extract_characters_regex
Creates data dict with id, length, video, task_type, pred_answer, pred, answer
Returns dict with "timescope_perception_score" metric

timescope_aggregate_results

def timescope_aggregate_results(results)

Aggregates results with detailed breakdown:

Groups by video length and task type
Computes accuracy for each length-task combination
Logs per-length-task accuracy
Logs per-length overall accuracy
Returns overall accuracy across all videos

Implementation Details

Video length tracking for granular analysis
Task type categorization (QA, OCR, temporal)
Multiple-choice format with options A-F
Case-insensitive answer comparison
Comprehensive logging of category-specific performance

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment

Retrieved from "https://leeroopedia.com/index.php?title=Implementation:EvolvingLMMs_Lab_Lmms_eval_LongTimeScope_Utils&oldid=6296"

Implementations