Heuristic:Testtimescaling Testtimescaling github io Hardcoded IDs vs Registry

Knowledge Sources	Codebase inspection update_arxiv_citations.py papers.json
Domains	Data_Management, Debugging
Last Updated	2026-02-14 00:00 GMT

Overview

Known design inconsistency: the Python citation script uses a hardcoded list of arXiv IDs instead of reading from the papers.json registry, requiring a triple-update process when adding new papers.

Description

The update_arxiv_citations.py script contains a hardcoded Python list of arXiv IDs (at approximately lines 22-25). Despite the existence of two papers.json registry files that contain the same paper identifiers in a structured format, the script does not read from either of them.

This means that adding a new paper to the citation tracking system requires updating three separate locations:

papers.json (repository root)
.github/scripts/papers.json (scripts directory)
.github/scripts/update_arxiv_citations.py (hardcoded list in Python code)

If any of these three updates is missed, the system will be in an inconsistent state: the JSON registries may list a paper that is not being tracked, or the script may track a paper not listed in the registry.

Usage

Be aware of this pitfall when adding a new paper to the citation tracking system. All three files must be updated in a single commit. During code review, verify that the arXiv ID list in the Python script matches the entries in both papers.json files.

The Insight (Rule of Thumb)

Action: When adding a new paper, update all three locations: both papers.json files AND the hardcoded list in update_arxiv_citations.py.
Value: The arxiv_ids list in the Python script must contain the same IDs as the arxiv_id fields in the JSON registry.
Trade-off: This triple-update is fragile. The recommended fix is to modify the Python script to read from papers.json instead of maintaining a hardcoded list, eliminating two of the three update points.
Severity: If only the JSON files are updated but the script is not, citations for the new paper will not be fetched. The badge will show incorrect (lower) citation counts.

Reasoning

Hardcoding data that exists in a structured data file violates the DRY (Don't Repeat Yourself) principle. The JSON registry was designed to be the single source of truth for tracked papers, but the Python script was written to use a hardcoded list -- likely as a quick initial implementation that was never refactored.

A simple fix would be:

import json

# Read arXiv IDs from the JSON registry instead of hardcoding them
with open("papers.json") as f:
    papers = json.load(f)
arxiv_ids = [p["arxiv_id"] for p in papers]

This would reduce the paper registration process from a triple-update to a single-update (just papers.json), and the duplicate .github/scripts/papers.json could also be eliminated.

Code Evidence

Hardcoded list in .github/scripts/update_arxiv_citations.py:L22-25:

    arxiv_ids = [
        "2503.24235",  # 示例
        # "2102.06828",  # 如果还有更多，就继续加
    ]

The same ID exists in the JSON registry at papers.json:L4:

    "arxiv_id": "2503.24235"

The commented-out ID "2102.06828" in the Python script suggests the developer intended to add more papers but left the pattern as a hardcoded list rather than reading from the JSON file.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment