Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Mbzuai oryx Awesome LLM Post training Fetch Paper Details

From Leeroopedia


Knowledge Sources
Domains Data_Collection, Graph_Traversal, Bibliometrics
Last Updated 2026-02-08 07:30 GMT

Overview

Concrete tool for recursively fetching paper metadata and expanding the citation graph via the Semantic Scholar API.

Description

The fetch_paper_details function retrieves complete metadata for a single paper from the Semantic Scholar /paper/{id} endpoint, then recursively follows its references and citations up to depth 2. It uses a global processed_papers dictionary for deduplication and a global paper_count counter to enforce the collection cap. Rate-limit responses (HTTP 429) trigger automatic retry with configurable backoff.

The function builds a nested data structure where each paper's References and Citations fields contain fully-fetched detail dictionaries of linked papers, enabling rich citation graph analysis.

Usage

Call this function for each seed paper returned by search_papers. It will automatically expand the corpus by recursively crawling references and citations. Ensure global configuration variables (max_papers, max_ref_citations, rate_limit_wait, processed_papers, paper_count) are initialized before calling.

Code Reference

Source Location

Signature

def fetch_paper_details(paper_id: str, depth: int = 1) -> Optional[dict]:
    """
    Fetches paper details with a limit on recursion depth.

    Args:
        paper_id: Semantic Scholar paper ID.
        depth: Current recursion depth (stops at depth > 2).

    Returns:
        Dict with keys: Title, Authors, Abstract, TL;DR,
        Publication Year, Venue (Conference/Journal), Link,
        References (list of recursively fetched dicts),
        Citations (list of recursively fetched dicts).
        Returns None if paper_count >= max_papers, duplicate, or API error.

    Side Effects:
        Increments global paper_count.
        Adds entry to global processed_papers dict.
    """

Import

# Function defined in scripts/deep_collection_sementic.py
# Dependencies:
import requests
import time
from tqdm import tqdm

I/O Contract

Inputs

Name Type Required Description
paper_id str Yes Semantic Scholar paper ID to fetch
depth int No Current recursion depth, default 1. Recursion stops when depth > 2

Global State Read:

Name Type Description
processed_papers dict Deduplication map of already-fetched papers
paper_count int Running count of collected papers
max_papers int Cap on total papers
max_ref_citations int Max references/citations to follow per paper
rate_limit_wait int Seconds to sleep on HTTP 429

Outputs

Name Type Description
return value Optional[dict] Paper metadata dict with nested References and Citations lists, or None

Output Dict Structure:

Key Type Description
Title str Paper title
Authors str Comma-separated author names
Abstract str Paper abstract
TL;DR str Auto-generated summary from Semantic Scholar
Publication Year int or str Year of publication, or "Unknown Year"
Venue (Conference/Journal) str Publication venue
Link str URL to the paper
References list[dict] Recursively fetched reference paper details
Citations list[dict] Recursively fetched citing paper details

Usage Examples

Basic Single Paper Fetch

# Fetch details for a known paper ID
paper_id = "649def34f8be52c8b66281af98ae884c09aef38b"
details = fetch_paper_details(paper_id, depth=1)

if details:
    print(f"Title: {details['Title']}")
    print(f"Year: {details['Publication Year']}")
    print(f"References: {len(details['References'])}")
    print(f"Citations: {len(details['Citations'])}")

Integration with Seed Search

# Full pipeline: seed search → recursive fetch
query = "Survey on Large Language and Reinforcement Learning"
papers = search_papers(query, limit=1)

data = []
if papers:
    for paper in papers:
        paper_id = paper.get("paperId")
        if paper_id:
            details = fetch_paper_details(paper_id)
            if details:
                data.append(details)

print(f"Total papers collected: {paper_count}")

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment