Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:EvolvingLMMs Lab Lmms eval LongTimeScope Utils

From Leeroopedia

Location: /tmp/kapso_repo_sslb_59s/lmms_eval/tasks/longtimescope/utils.py

Principle: Task_Utility_Functions

Purpose

Task-specific utilities for LongTimeScope benchmark evaluating long video understanding across QA, OCR, and temporal reasoning tasks.

Constants

  • TASK_CATEGORIES = ["QA", "OCR", "temporal"]

Configuration

  • Reads cache directory from longtimescope.yaml
  • Base cache from HF_HOME environment variable (default: ~/.cache/huggingface/)
  • Handles video files with multiple extensions (mp4, MP4, mkv)

Key Functions

convert_time_to_frame

def convert_time_to_frame(time_in_seconds, fps)

Converts time in seconds to frame number given FPS rate.

timescope_doc_to_visual

def timescope_doc_to_visual(doc)

Locates video file path:

  • Constructs path from cache directory and doc["video"]
  • Tries multiple extensions (mp4 -> MP4 -> mkv)
  • Exits with error if video not found
  • Returns list with single video path

extract_characters_regex

def extract_characters_regex(s)

Extracts answer choice from response text:

  • Strips common answer prefixes ("The best answer is", etc.)
  • Returns empty string if response too long (>10 words) without [ABCDEF]
  • Uses regex to find first [ABCDEF] character
  • Returns matched character or empty string

timescope_process_results

def timescope_process_results(doc, results)

Processes single result:

  • Extracts predicted answer using extract_characters_regex
  • Creates data dict with id, length, video, task_type, pred_answer, pred, answer
  • Returns dict with "timescope_perception_score" metric

timescope_aggregate_results

def timescope_aggregate_results(results)

Aggregates results with detailed breakdown:

  • Groups by video length and task type
  • Computes accuracy for each length-task combination
  • Logs per-length-task accuracy
  • Logs per-length overall accuracy
  • Returns overall accuracy across all videos

Implementation Details

  • Video length tracking for granular analysis
  • Task type categorization (QA, OCR, temporal)
  • Multiple-choice format with options A-F
  • Case-insensitive answer comparison
  • Comprehensive logging of category-specific performance

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment