Implementation:Huggingface Datatrove FailedLogs

Knowledge Sources	Huggingface_Datatrove
Domains	Pipeline Operations, Debugging
Last Updated	2026-02-14 17:00 GMT

Overview

FailedLogs is a command-line tool that identifies incomplete pipeline tasks and displays their log files to assist in diagnosing failures.

Description

The failed_logs module provides a diagnostic tool for investigating pipeline task failures. It reads the executor.json configuration from a logging directory to determine the total number of expected tasks (world_size), then scans the completions subdirectory to identify which task ranks did not complete successfully. For each incomplete task, it locates the corresponding log file using a regex pattern (`logs/task_XXXXX.log`) and displays the log content through a paginated console interface.

The tool uses the rich library for formatted console output, including status spinners during directory scanning and interactive pagination through log files. After displaying each log, it prompts the user with a Confirm dialog to decide whether to view the next incomplete task's log, allowing selective review of failures.

The tool extracts the task rank from log filenames using a compiled regex pattern (RANK_FROM_LOG_FILENAME_REGEX) and cross-references these against the set of incomplete ranks to display only the relevant logs.

Usage

Use this tool when a distributed pipeline run has incomplete tasks and you need to investigate why specific tasks failed. Point it at the logging directory of the run and it will identify and display the logs of all failed tasks.

Code Reference

Source Location

Repository: Huggingface_Datatrove
File: src/datatrove/tools/failed_logs.py
Lines: 1-75

Signature

def main():
    """
    Takes a `logging_dir` as input, gets total number of tasks from `executor.json`
    and then gets which ranks are incomplete by scanning `logging_dir/completions`.
    The log files for the incomplete tasks are then displayed.
    """

Import

from datatrove.tools.failed_logs import main

I/O Contract

Inputs

Name	Type	Required	Description
path	str (CLI argument)	No	Path to the logging folder containing executor.json (default: current directory)

Outputs

Name	Type	Description
Console output	Rich formatted text	Paginated display of log files for each incomplete task

Usage Examples

Basic Usage

# From the command line
python -m datatrove.tools.failed_logs /path/to/logging_dir

# Programmatic usage
from datatrove.tools.failed_logs import main
import sys

sys.argv = ["failed_logs", "/path/to/logging_dir"]
main()

Related Pages

Principle:Huggingface_Datatrove_Pipeline_Failure_Diagnosis

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment