Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Mbzuai oryx Awesome LLM Post training Depth Limit Recursion At 2

From Leeroopedia





Knowledge Sources
Domains Data_Collection, Optimization
Last Updated 2026-02-08 08:00 GMT

Overview

Recursion depth limit of 2 for citation graph traversal to keep the paper corpus at a manageable size while maintaining relevance.

Description

The deep paper collection script recursively follows references and citations from seed papers. Without a depth limit, the citation graph would expand exponentially, eventually encompassing millions of papers. The script limits recursion to depth 2: seed papers (depth 1) have their references and citations fetched (depth 2), but those second-level papers do not trigger further fetching. This creates a focused corpus of directly and indirectly related papers.

Usage

Apply this heuristic when traversing citation graphs or any recursive graph structure. The depth limit of 2 strikes a balance between corpus breadth and API/time costs. Increase to 3 for broader coverage (at exponentially higher cost) or decrease to 1 for a minimal seed-only corpus.

The Insight (Rule of Thumb)

  • Action: Add a `depth` parameter to recursive fetching functions and stop recursing when `depth > 2`.
  • Value: Depth 2 captures seed papers plus their immediate citations and references. With a typical paper having 30-50 references and 10-100 citations, depth 2 yields hundreds to thousands of papers from a single seed.
  • Trade-off: Depth 2 misses papers that are 3+ hops away from the seed. However, the most relevant papers in a research area are typically within 2 hops of a survey paper. Going to depth 3 would expand the corpus by 10-100x.

Reasoning

Citation graphs follow a power-law distribution: a few papers are highly cited, and most papers have few citations. At depth 1 (seed only), the corpus is too narrow. At depth 2, the corpus captures the "neighborhood" of the seed papers, including seminal works and recent follow-ups. At depth 3+, the corpus rapidly includes tangentially related papers from adjacent research fields, diluting relevance.

Combined with the max_papers=1000 global limit and max_ref_citations=200 per-paper cap, the depth 2 limit creates a three-layer defense against uncontrolled growth.

Code evidence from `scripts/deep_collection_sementic.py:75`:

# Limit depth to avoid long chains
if depth <= 2:  # Allow fetching references & citations up to depth 2
    ref_ids = [ref["paperId"] for ref in paper.get("references", []) if "paperId" in ref][:max_ref_citations]
    cite_ids = [cite["paperId"] for cite in paper.get("citations", []) if "paperId" in cite][:max_ref_citations]

Code evidence from `scripts/deep_collection_sementic.py:37`:

def fetch_paper_details(paper_id, depth=1):
    """Fetches paper details with a limit on recursion depth"""

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment