Heuristic:Testtimescaling Testtimescaling github io Hardcoded IDs vs Registry
| Knowledge Sources | |
|---|---|
| Domains | Data_Management, Debugging |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
Known design inconsistency: the Python citation script uses a hardcoded list of arXiv IDs instead of reading from the papers.json registry, requiring a triple-update process when adding new papers.
Description
The update_arxiv_citations.py script contains a hardcoded Python list of arXiv IDs (at approximately lines 22-25). Despite the existence of two papers.json registry files that contain the same paper identifiers in a structured format, the script does not read from either of them.
This means that adding a new paper to the citation tracking system requires updating three separate locations:
papers.json(repository root).github/scripts/papers.json(scripts directory).github/scripts/update_arxiv_citations.py(hardcoded list in Python code)
If any of these three updates is missed, the system will be in an inconsistent state: the JSON registries may list a paper that is not being tracked, or the script may track a paper not listed in the registry.
Usage
Be aware of this pitfall when adding a new paper to the citation tracking system. All three files must be updated in a single commit. During code review, verify that the arXiv ID list in the Python script matches the entries in both papers.json files.
The Insight (Rule of Thumb)
- Action: When adding a new paper, update all three locations: both
papers.jsonfiles AND the hardcoded list inupdate_arxiv_citations.py. - Value: The
arxiv_idslist in the Python script must contain the same IDs as thearxiv_idfields in the JSON registry. - Trade-off: This triple-update is fragile. The recommended fix is to modify the Python script to read from
papers.jsoninstead of maintaining a hardcoded list, eliminating two of the three update points. - Severity: If only the JSON files are updated but the script is not, citations for the new paper will not be fetched. The badge will show incorrect (lower) citation counts.
Reasoning
Hardcoding data that exists in a structured data file violates the DRY (Don't Repeat Yourself) principle. The JSON registry was designed to be the single source of truth for tracked papers, but the Python script was written to use a hardcoded list -- likely as a quick initial implementation that was never refactored.
A simple fix would be:
import json
# Read arXiv IDs from the JSON registry instead of hardcoding them
with open("papers.json") as f:
papers = json.load(f)
arxiv_ids = [p["arxiv_id"] for p in papers]
This would reduce the paper registration process from a triple-update to a single-update (just papers.json), and the duplicate .github/scripts/papers.json could also be eliminated.
Code Evidence
Hardcoded list in .github/scripts/update_arxiv_citations.py:L22-25:
arxiv_ids = [
"2503.24235", # 示例
# "2102.06828", # 如果还有更多,就继续加
]
The same ID exists in the JSON registry at papers.json:L4:
"arxiv_id": "2503.24235"
The commented-out ID "2102.06828" in the Python script suggests the developer intended to add more papers but left the pattern as a hardcoded list rather than reading from the JSON file.