Heuristic:ThreeSR Awesome Inference Time Scaling Date Parsing Fallback Tip

Knowledge Sources	Awesome-Inference-Time-Scaling
Domains	Data_Processing, Chronological_Sorting
Last Updated	2026-02-14 00:00 GMT

Overview

Graceful degradation strategy for paper entries with missing or malformed dates: unparseable dates default to datetime.min, causing those entries to sort to the end of the list rather than causing a crash.

Description

The parse_date_from_block() function extracts dates from paper entry blocks using the regex pattern -\s*🗓️\s*\*\*Date:\*\*\s*([\d]{4}-[\d]{2}-[\d]{2}). When the regex does not match (e.g., missing date field, non-standard format) or when datetime.strptime() fails to parse the extracted string, the function returns None.

The calling function (write_to_readme_in_sorted_order()) handles None by substituting datetime.min, which ensures that entries with unparseable dates are placed at the end of the sorted list (since sorting is descending by date, datetime.min is the lowest possible value).

Usage

Use this heuristic when:

Manually adding paper entries where the publication date is unknown or not in YYYY-MM-DD format.
Debugging sort order issues where a paper appears at the bottom of the list unexpectedly.
Understanding the script's fault tolerance -- the script will not crash on malformed date fields.

The Insight (Rule of Thumb)

Action: Entries with missing or malformed dates are assigned datetime.min (year 1, January 1) as their sort key.
Value: These entries will always appear at the end of the chronologically sorted list (newest-first order).
Trade-off: No crash or data loss, but entries with bad dates may be "hidden" at the bottom of a long list. There is no warning logged when a date fallback occurs (only a print statement if strptime itself raises an exception).

Reasoning

Robustness over strictness: when curating a list of hundreds of papers, some entries may have incomplete metadata from the Semantic Scholar API (e.g., preprints without an official publication date). Rather than failing the entire merge operation for one bad entry, the script silently degrades by sorting the problematic entry to the end.

Date extraction logic (fetch_semantic_info.py:77-89):

def parse_date_from_block(block):
    """
    Extract the date from the markdown block of a paper entry.
    Expected date line format: - 🗓️ **Date:** YYYY-MM-DD
    """
    match = re.search(r'-\s*🗓️\s*\*\*Date:\*\*\s*([\d]{4}-[\d]{2}-[\d]{2})', block)
    if match:
        date_str = match.group(1)
        try:
            return datetime.strptime(date_str, '%Y-%m-%d')
        except Exception as e:
            print(f"Error parsing date format: {e}")
    return None

Fallback handling in the sort step (fetch_semantic_info.py:170-175):

for entry in all_entries:
    dt = parse_date_from_block(entry)
    # If the date cannot be parsed, set it to a very early date so that it appears at the end
    if dt is None:
        dt = datetime.min
    merged_entries.append((dt, entry))

The inline comment explicitly documents the design intent: "set it to a very early date so that it appears at the end".

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment