Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:ThreeSR Awesome Inference Time Scaling Empty Venue Default Tip

From Leeroopedia
Knowledge Sources
Domains Data_Formatting, Academic_Curation
Last Updated 2026-02-14 00:00 GMT

Overview

Default value strategy that substitutes "arXiv.org" when the Semantic Scholar API returns an empty string for a paper's venue field, ensuring every entry has a meaningful publisher label.

Description

The Semantic Scholar API frequently returns an empty string ("") for the venue field, especially for preprints that have not yet been published at a conference or journal. The format_paper_info() function detects this case and substitutes the string "arXiv.org" as the publisher, since most papers in this repository's domain (inference-time scaling) are initially posted as arXiv preprints.

This is a domain-specific assumption: the repository focuses on machine learning research where arXiv is the dominant preprint server. For repositories covering other domains, a different default (or no default) might be more appropriate.

Usage

Use this heuristic when:

  • Formatting paper entries where the venue field is empty or missing.
  • Understanding why "arXiv.org" appears as the publisher for many entries in the README.
  • Adapting the script for a different domain where arXiv may not be the appropriate default.

The Insight (Rule of Thumb)

  • Action: Check if the venue field is an empty string; if so, replace it with "arXiv.org".
  • Value: Ensures the 📑 **Publisher:** line in every paper entry has a non-empty value, maintaining visual consistency in the curated list.
  • Trade-off: Assumes all papers without a venue are arXiv preprints, which is generally true for ML research but may not hold for papers from other databases or domains.

Reasoning

In the machine learning research community, most new papers are first published as arXiv preprints before (or instead of) appearing at a conference. The Semantic Scholar API reflects this by returning an empty venue for preprints. Rather than displaying "Unknown Publisher" or an empty field, the script author chose "arXiv.org" as the most likely and most useful default.

Code evidence (fetch_semantic_info.py:57-59):

publisher = paper.get("venue", "Unknown Publisher")
if publisher == '':
    publisher = "arXiv.org"

Note the two-step approach:

  1. paper.get("venue", "Unknown Publisher") handles the case where the venue key is entirely absent (defaults to "Unknown Publisher").
  2. if publisher == : handles the more common case where the key exists but is an empty string (overrides to "arXiv.org").

This means the output is:

  • Key absent: "Unknown Publisher"
  • Key present, empty string: "arXiv.org"
  • Key present, non-empty: the actual venue name (e.g., "NeurIPS", "ICML")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment