Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Datatrove TrackJobs

From Leeroopedia
Knowledge Sources
Domains Pipeline Operations, Monitoring
Last Updated 2026-02-14 17:00 GMT

Overview

TrackJobs is a command-line tool that provides a real-time visual dashboard for tracking the progress of multiple pipeline jobs, featuring progress bars, delta indicators, and optional continuous monitoring with configurable refresh intervals.

Description

The track_jobs module provides a rich terminal-based dashboard for monitoring distributed pipeline job progress. Unlike the simpler jobs_status tool that provides a one-shot text report, TrackJobs creates a visually rich panel display with progress bars, percentage indicators, and delta tracking that shows changes since the last refresh.

The tool supports glob patterns in the path argument (e.g., `/path/to/runs/*`) to match multiple job directories dynamically. It discovers valid job directories by checking for the presence of executor.json, reads the total task count from the configuration, and counts completed tasks from the completions subdirectory. Results are displayed in a panel showing both job-level progress (how many job directories are fully complete) and task-level progress (total completed tasks across all jobs).

Key features include continuous monitoring mode (activated with the `-i` flag) that uses Rich's Live display to automatically refresh at a configurable interval, caching of completed job statuses and invalid directories to avoid redundant filesystem operations, delta indicators that show how many jobs and tasks completed since the previous refresh (displayed in green), and adaptive progress bars that scale to the terminal width and use different characters to distinguish old progress from new progress.

The display includes a timestamp, visual progress bars using block characters, and emoji indicators. The progress bar visualization distinguishes between previously completed work (solid blocks), newly completed work (medium blocks), and the current position (a cursor character).

Usage

Use this tool for real-time monitoring of long-running distributed pipeline jobs. The continuous monitoring mode is especially useful for observing progress during active processing, while the single-shot mode provides a quick status check.

Code Reference

Source Location

Signature

def main():
    """Track job progress with optional continuous monitoring."""

def expand_path_pattern(path_pattern):
    """Expand glob pattern or return original path if no magic characters."""

def find_valid_directories(paths, invalid_dirs_cache):
    """Find directories that contain executor.json."""

def get_job_status(job_path, completed_jobs_cache):
    """Get completion status for a job directory."""

def create_display(job_statuses, console, previous_state=None):
    """Create the display panel with progress information."""

Import

from datatrove.tools.track_jobs import main

I/O Contract

Inputs

Name Type Required Description
path str (CLI argument) Yes Path to logging folder(s), may contain glob patterns like '*'
--interval / -i int No Refresh interval in seconds for continuous monitoring mode (default: None, single shot)

Outputs

Name Type Description
Console panel Rich Panel Visual dashboard showing job and task progress with progress bars and delta indicators

Usage Examples

Basic Usage

# Single status check for jobs matching a glob pattern
python -m datatrove.tools.track_jobs "/path/to/runs/*"

# Continuous monitoring with 30-second refresh
python -m datatrove.tools.track_jobs "/path/to/runs/*" -i 30

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment