Implementation:Mbzuai oryx Awesome LLM Post training Get Paper Count

Knowledge Sources	Awesome-LLM-Post-training Semantic Scholar API Docs
Domains	Bibliometrics, Trend_Analysis
Last Updated	2026-02-08 07:30 GMT

Overview

Concrete tool for querying yearly publication counts from the Semantic Scholar API for research trend analysis.

Description

The get_paper_count function queries the Semantic Scholar /paper/search endpoint with a keyword and year filter, requesting only 1 result (limit=1) to minimize data transfer while extracting the total count from the response. It includes aggressive retry logic (up to 10 retries) for HTTP 429 rate-limit responses with a 10-second sleep between attempts. A custom User-Agent header is set to identify the request as academic research.

Usage

Call this function for each keyword-year combination in the trend analysis loop. It is called within a nested loop: outer loop over keywords (from CSV), inner loop over years (2020-2025). A 1-second delay between calls is applied externally to be polite to the API.

Code Reference

Source Location

Repository: Awesome-LLM-Post-training
File: scripts/future_research_data.py
Lines: 8-24

Signature

def get_paper_count(query: str, year: int) -> int:
    """
    Get number of papers for a given query and year from Semantic Scholar.

    Args:
        query: Research keyword to search for.
        year: Publication year filter.

    Returns:
        int: Total number of papers matching the query for that year.
        Returns 0 on error or retry exhaustion.
    """

Import

# Function defined in scripts/future_research_data.py
# Dependencies:
import requests
import time

I/O Contract

Inputs

Name	Type	Required	Description
query	str	Yes	Research keyword to search for
year	int	Yes	Publication year filter (e.g., 2023)

Outputs

Name	Type	Description
return value	int	Total number of papers matching query for the specified year. Returns 0 on error.

Usage Examples

Single Query

# Get paper count for a specific keyword and year
count = get_paper_count("reinforcement learning from human feedback", 2023)
print(f"RLHF papers in 2023: {count}")

Full Trend Analysis Loop

import time

keywords = ["RLHF", "Direct Preference Optimization", "MCTS for LLM"]
years = list(range(2020, 2026))

for keyword in keywords:
    counts = []
    for year in years:
        count = get_paper_count(keyword, year)
        counts.append(count)
        time.sleep(1)  # Polite delay between requests
    print(f"{keyword}: {dict(zip(years, counts))}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment