Implementation:Huggingface Datatrove FailedLogs
| Knowledge Sources | |
|---|---|
| Domains | Pipeline Operations, Debugging |
| Last Updated | 2026-02-14 17:00 GMT |
Overview
FailedLogs is a command-line tool that identifies incomplete pipeline tasks and displays their log files to assist in diagnosing failures.
Description
The failed_logs module provides a diagnostic tool for investigating pipeline task failures. It reads the executor.json configuration from a logging directory to determine the total number of expected tasks (world_size), then scans the completions subdirectory to identify which task ranks did not complete successfully. For each incomplete task, it locates the corresponding log file using a regex pattern (`logs/task_XXXXX.log`) and displays the log content through a paginated console interface.
The tool uses the rich library for formatted console output, including status spinners during directory scanning and interactive pagination through log files. After displaying each log, it prompts the user with a Confirm dialog to decide whether to view the next incomplete task's log, allowing selective review of failures.
The tool extracts the task rank from log filenames using a compiled regex pattern (RANK_FROM_LOG_FILENAME_REGEX) and cross-references these against the set of incomplete ranks to display only the relevant logs.
Usage
Use this tool when a distributed pipeline run has incomplete tasks and you need to investigate why specific tasks failed. Point it at the logging directory of the run and it will identify and display the logs of all failed tasks.
Code Reference
Source Location
- Repository: Huggingface_Datatrove
- File: src/datatrove/tools/failed_logs.py
- Lines: 1-75
Signature
def main():
"""
Takes a `logging_dir` as input, gets total number of tasks from `executor.json`
and then gets which ranks are incomplete by scanning `logging_dir/completions`.
The log files for the incomplete tasks are then displayed.
"""
Import
from datatrove.tools.failed_logs import main
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| path | str (CLI argument) | No | Path to the logging folder containing executor.json (default: current directory) |
Outputs
| Name | Type | Description |
|---|---|---|
| Console output | Rich formatted text | Paginated display of log files for each incomplete task |
Usage Examples
Basic Usage
# From the command line
python -m datatrove.tools.failed_logs /path/to/logging_dir
# Programmatic usage
from datatrove.tools.failed_logs import main
import sys
sys.argv = ["failed_logs", "/path/to/logging_dir"]
main()