Heuristic:Testtimescaling Testtimescaling github io Dual JSON Sync

Knowledge Sources	Codebase inspection papers.json (root) papers.json (scripts)
Domains	Data_Management, Debugging
Last Updated	2026-02-14 00:00 GMT

Overview

Known maintenance pitfall: two identical papers.json files exist at different paths and must be manually kept in sync, creating a risk of data divergence.

Description

The repository contains two copies of the paper registry:

papers.json at the repository root
.github/scripts/papers.json alongside the automation scripts

Both files contain the same JSON array of paper objects (title and arXiv ID). There is no automated mechanism to synchronize them. When a contributor adds a new paper, they must remember to update both files with identical content.

This duplication exists as a historical artifact. The root copy may serve as the "public-facing" registry, while the scripts copy was likely placed alongside the Python script for co-location convenience. However, the Python script (update_arxiv_citations.py) does not actually read from either file -- it uses a hardcoded list (see the Hardcoded_IDs_vs_Registry heuristic).

Usage

Be aware of this pitfall when adding a new paper to the citation tracking system. Always update both JSON files in the same commit to avoid drift. Review both files during code review to catch any inconsistencies.

The Insight (Rule of Thumb)

Action: When modifying papers.json, always update both copies in the same commit.
Value: Both files must contain identical JSON content at all times.
Trade-off: Manual synchronization is error-prone. A future improvement would be to eliminate one copy and have the other reference it, or to have the Python script read from the JSON file instead of a hardcoded list.
Detection: Differences between the two files can be detected with diff papers.json .github/scripts/papers.json.

Reasoning

Data duplication without automated synchronization is a well-known source of bugs. When two files must stay in sync manually, drift is inevitable as the project grows. In this repository, the risk is low (only 1 paper currently tracked), but as more papers are added, the probability of one file being updated while the other is forgotten increases.

The current state of both files is identical (verified during this analysis):

[
  {
    "title": "What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models",
    "arxiv_id": "2503.24235"
  }
]

Code Evidence

Root copy at papers.json:L1-6:

[
  {
    "title": "What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models",
    "arxiv_id": "2503.24235"
  }
]

Scripts copy at .github/scripts/papers.json:L1-6:

[
  {
    "title": "What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models",
    "arxiv_id": "2503.24235"
  }
]

Note: Neither file is read by the Python script. See the Hardcoded_IDs_vs_Registry heuristic for details on this related issue.

Related Pages

Implementation:Testtimescaling_Testtimescaling_github_io_Json_Paper_Registration

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment