Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Mbzuai oryx Awesome LLM Post training Search Papers

From Leeroopedia


Knowledge Sources
Domains Data_Collection, Information_Retrieval
Last Updated 2026-02-08 07:30 GMT

Overview

Concrete tool for querying the Semantic Scholar Graph API to retrieve seed papers for corpus building.

Description

The search_papers function sends a keyword query to the Semantic Scholar /paper/search endpoint and returns a list of paper metadata dictionaries. It includes built-in retry logic for HTTP 429 rate-limit responses, retrying up to 3 times with a configurable wait interval. The function requests comprehensive metadata fields including title, authors, abstract, URL, TL;DR, year, venue, references, and citations.

Usage

Import and call this function as the first step of the deep paper collection pipeline. It produces the seed set that feeds into recursive reference/citation crawling via fetch_paper_details.

Code Reference

Source Location

Signature

def search_papers(query: str, limit: int = 5) -> Optional[List[dict]]:
    """
    Search Semantic Scholar for papers matching query.

    Args:
        query: Search query string sent to Semantic Scholar API.
        limit: Maximum number of seed papers to retrieve (default 5).

    Returns:
        List of paper metadata dicts from Semantic Scholar 'data' field,
        or None on failure after retries.
    """

Import

# Function defined in scripts/deep_collection_sementic.py
# Dependencies:
import requests
import time

I/O Contract

Inputs

Name Type Required Description
query str Yes Search query string sent to Semantic Scholar API
limit int No Maximum number of seed papers to retrieve (default 5)

Outputs

Name Type Description
return value Optional[List[dict]] List of paper metadata dicts, each containing paperId, title, authors, abstract, url, tldr, year, venue, references, citations. Returns None on failure.

Usage Examples

Basic Seed Search

# Search for papers on LLM post-training
query = "Survey on Large Language and Reinforcement Learning"
papers = search_papers(query, limit=1)

if papers:
    print(f"Found {len(papers)} seed papers")
    for paper in papers:
        print(f"  - {paper.get('title')}")
        paper_id = paper.get("paperId")
        # Feed into recursive crawling
        details = fetch_paper_details(paper_id)
else:
    print("No papers found or API error")

Broader Seed Search

# Retrieve more seed papers for wider coverage
papers = search_papers("reinforcement learning from human feedback", limit=5)

if papers:
    data = []
    for paper in papers:
        paper_id = paper.get("paperId")
        if paper_id:
            details = fetch_paper_details(paper_id)
            if details:
                data.append(details)

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment