Principle:Testtimescaling Testtimescaling github io Citation Registration

Knowledge Sources	JSON data management, CI/CD pipeline design, Semantic Scholar API
Domains	Data_Management, Automation
Last Updated	2026-02-14

Overview

Registering academic papers in a structured JSON registry enables automated citation tracking pipelines to fetch and aggregate citation counts over time.

Description

Citation Registration is the process of adding a paper's metadata to the repository's citation tracking system. This system consists of a JSON registry file that stores paper identifiers and a GitHub Actions workflow that periodically queries the Semantic Scholar API to retrieve current citation counts.

The registration step bridges the gap between manual paper curation (Steps 1-3) and automated data maintenance. Once a paper is registered, its citation count is automatically updated without further human intervention.

However, the current architecture of this repository has an important complexity: there are three locations that must be updated for a new paper to be fully registered in the citation tracking system:

Root papers.json: The primary registry file at the repository root containing an array of paper objects with title and arxiv_id fields.
Workflow papers.json: A duplicate copy at .github/scripts/papers.json that exists alongside the automation scripts.
Python script hardcoded IDs: The automation script at .github/scripts/update_arxiv_citations.py contains a hardcoded list of arXiv IDs (approximately lines 22-25) that it iterates over to fetch citations. This script does not read from either papers.json file.

This triple-update requirement is a known technical debt issue. The fundamental design principle is that all paper identifiers must be synchronized across all three locations; failure to update any one of them results in incomplete citation tracking.

Usage

Use this principle after adding the paper to the comparison table (Step 3). Citation registration is Step 4 of the Adding_a_New_Paper workflow. The contributor needs only the paper title and arXiv ID, both of which were determined in Steps 1 and 2.

Theoretical Basis

Citation registration follows principles from data pipeline design and configuration management:

Single source of truth (aspirational): In an ideal architecture, there would be one authoritative registry of papers, and all consumers (scripts, workflows, badges) would read from that single source. The current architecture deviates from this ideal by maintaining multiple copies of the paper list, creating a synchronization burden. Understanding this gap is important for contributors to avoid partial updates.

Registry pattern: The JSON file acts as a registry -- a central catalog of entities (papers) with their identifiers. The registry pattern is common in systems that need to enumerate and iterate over a known set of items. Each registry entry contains the minimal information needed for identification: a human-readable title and a machine-usable arXiv ID.

Idempotent updates: Adding a paper that is already in the registry should be a no-op (or produce an identical result). The JSON structure (array of objects) makes duplicate detection straightforward by checking the arxiv_id field.

Pipeline decoupling: By separating registration (human action) from citation fetching (automated action), the system decouples the rate of paper addition from the rate of citation updates. Papers can be added at any time, and the next scheduled workflow run will pick them up. This is a standard pattern in event-driven pipeline architectures.

Consistency requirement: The most critical aspect of this registration is maintaining consistency across all three update locations. An inconsistency (e.g., a paper in papers.json but not in the Python script) will result in the paper being silently excluded from citation tracking. No automated validation currently exists to detect such inconsistencies.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment