Heuristic:Testtimescaling Testtimescaling github io Dual JSON Sync
| Knowledge Sources | |
|---|---|
| Domains | Data_Management, Debugging |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
Known maintenance pitfall: two identical papers.json files exist at different paths and must be manually kept in sync, creating a risk of data divergence.
Description
The repository contains two copies of the paper registry:
papers.jsonat the repository root.github/scripts/papers.jsonalongside the automation scripts
Both files contain the same JSON array of paper objects (title and arXiv ID). There is no automated mechanism to synchronize them. When a contributor adds a new paper, they must remember to update both files with identical content.
This duplication exists as a historical artifact. The root copy may serve as the "public-facing" registry, while the scripts copy was likely placed alongside the Python script for co-location convenience. However, the Python script (update_arxiv_citations.py) does not actually read from either file -- it uses a hardcoded list (see the Hardcoded_IDs_vs_Registry heuristic).
Usage
Be aware of this pitfall when adding a new paper to the citation tracking system. Always update both JSON files in the same commit to avoid drift. Review both files during code review to catch any inconsistencies.
The Insight (Rule of Thumb)
- Action: When modifying
papers.json, always update both copies in the same commit. - Value: Both files must contain identical JSON content at all times.
- Trade-off: Manual synchronization is error-prone. A future improvement would be to eliminate one copy and have the other reference it, or to have the Python script read from the JSON file instead of a hardcoded list.
- Detection: Differences between the two files can be detected with
diff papers.json .github/scripts/papers.json.
Reasoning
Data duplication without automated synchronization is a well-known source of bugs. When two files must stay in sync manually, drift is inevitable as the project grows. In this repository, the risk is low (only 1 paper currently tracked), but as more papers are added, the probability of one file being updated while the other is forgotten increases.
The current state of both files is identical (verified during this analysis):
[
{
"title": "What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models",
"arxiv_id": "2503.24235"
}
]
Code Evidence
Root copy at papers.json:L1-6:
[
{
"title": "What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models",
"arxiv_id": "2503.24235"
}
]
Scripts copy at .github/scripts/papers.json:L1-6:
[
{
"title": "What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models",
"arxiv_id": "2503.24235"
}
]
Note: Neither file is read by the Python script. See the Hardcoded_IDs_vs_Registry heuristic for details on this related issue.