Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Mbzuai oryx Awesome LLM Post training Recursive Paper Fetching

From Leeroopedia


Knowledge Sources
Domains Data_Collection, Graph_Traversal, Bibliometrics
Last Updated 2026-02-08 07:30 GMT

Overview

A depth-limited recursive graph traversal strategy that expands an academic paper corpus by following reference and citation links from known papers.

Description

Recursive Paper Fetching implements a bounded breadth-first expansion of the academic citation graph. Starting from seed papers, it fetches each paper's full metadata along with its reference and citation lists, then recursively fetches those linked papers up to a configurable depth limit. The algorithm incorporates three critical safeguards: depth limiting (to prevent infinite recursion), deduplication (to avoid re-fetching papers already in the corpus), and a global paper count cap (to bound total collection size).

This technique addresses the fundamental challenge of building comprehensive domain-specific corpora: a keyword search alone misses papers that are relevant but use different terminology. By traversing the citation graph, the pipeline discovers papers that are structurally connected to the seed set regardless of keyword overlap.

Usage

Use this principle when building a research corpus that needs to capture the full citation neighborhood around a set of seed papers. It is appropriate when:

  • Simple keyword search is insufficient for complete domain coverage
  • The citation graph structure is informative for understanding the field
  • Collection size must be bounded despite the exponential growth of citation links
  • Deduplication across recursive branches is necessary

Theoretical Basis

The algorithm performs a depth-limited graph traversal over the academic citation graph:

fetch(p,d)={metadata(p)rrefs(p)fetch(r,d+1)ccites(p)fetch(c,d+1)if dDmax and pvisitedotherwise

Where:

  • p is a paper ID
  • d is the current recursion depth
  • Dmax is the maximum allowed depth
  • visited is a global deduplication set

Pseudo-code Logic:

# Abstract recursive fetching algorithm (NOT real implementation)
def fetch(paper_id, depth):
    if depth > MAX_DEPTH or paper_id in visited or count >= MAX_PAPERS:
        return None
    metadata = api.get_paper(paper_id)
    visited.add(paper_id)
    count += 1
    for ref_id in metadata.references[:MAX_PER_PAPER]:
        fetch(ref_id, depth + 1)
    for cite_id in metadata.citations[:MAX_PER_PAPER]:
        fetch(cite_id, depth + 1)
    return metadata

The exponential branching factor is controlled by three bounds: depth limit, per-paper reference/citation cap, and total paper count.

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment