Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Mbzuai oryx Awesome LLM Post training Python Requests

From Leeroopedia


Knowledge Sources
Domains Data_Collection, Infrastructure
Last Updated 2026-02-08 08:00 GMT

Overview

Python 3.x environment with requests, json, time, tqdm, and pandas for querying the Semantic Scholar API and processing paper metadata.

Description

This environment provides the core runtime for the deep paper collection and research trend analysis scripts. It includes the requests library for HTTP GET calls to the Semantic Scholar Graph API, time for rate-limit backoff delays, tqdm for progress bars during recursive paper fetching, json for checkpoint serialization, and pandas for data normalization and Excel export. The os module is used for directory creation and path management.

Usage

Use this environment for any workflow that queries the Semantic Scholar API, including seed paper search (search_papers), recursive paper detail fetching (fetch_paper_details), publication count querying (get_paper_count), and JSON/Excel export operations. It is the mandatory prerequisite for running deep_collection_sementic.py and future_research_data.py.

System Requirements

Category Requirement Notes
OS Any (Linux, macOS, Windows) No OS-specific dependencies
Hardware Standard CPU No GPU required; network access required for API calls
Network Internet access Must reach api.semanticscholar.org on HTTPS (port 443)
Disk 1GB free For JSON checkpoint files and Excel output

Dependencies

System Packages

  • No system-level packages required beyond Python itself

Python Packages

  • `python` >= 3.6
  • `requests` (any recent version)
  • `pandas` (any recent version)
  • `tqdm` (any recent version)
  • `openpyxl` (required by pandas for Excel export)

Credentials

No API keys are required for the Semantic Scholar API at the basic rate tier. However, rate limits apply (see Common Errors).

Optional:

  • `S2_API_KEY`: Semantic Scholar API key for higher rate limits. Not used in the current scripts but recommended for production use.

Quick Install

# Install all required packages
pip install requests pandas tqdm openpyxl

Code Evidence

Import statements from `scripts/deep_collection_sementic.py:1-6`:

import requests
import json
import time
import os
from tqdm import tqdm
import pandas as pd

Import statements from `scripts/future_research_data.py:1-6`:

import os
import json
import requests
import time
import pandas as pd
import matplotlib.pyplot as plt

HTTP request pattern from `scripts/deep_collection_sementic.py:21-27`:

url = f"https://api.semanticscholar.org/graph/v1/paper/search?query={query}&limit={limit}&fields=title,authors,abstract,url,tldr,year,venue,references,citations"

for _ in range(3):  # Retry up to 3 times if 429 error occurs
    response = requests.get(url)
    if response.status_code == 200:
        return response.json().get("data", [])

User-Agent header from `scripts/future_research_data.py:11`:

headers = {'User-Agent': 'AcademicResearch/1.0 (mailto:user@example.com)'}

Common Errors

Error Message Cause Solution
`ModuleNotFoundError: No module named 'requests'` requests not installed `pip install requests`
`ModuleNotFoundError: No module named 'tqdm'` tqdm not installed `pip install tqdm`
`ModuleNotFoundError: No module named 'openpyxl'` openpyxl not installed (needed by pandas for .xlsx) `pip install openpyxl`
HTTP 429 Rate limit exceeded Too many API requests to Semantic Scholar Scripts have built-in retry logic with 10s backoff; consider adding an API key for higher limits
`ConnectionError` / `requests.exceptions.ConnectionError` No internet access or API endpoint unreachable Check network connectivity to api.semanticscholar.org

Compatibility Notes

  • All platforms: Works on Linux, macOS, and Windows without modification.
  • Python version: Requires Python 3.6+ for f-string support used throughout the scripts.
  • Semantic Scholar API: Free tier has rate limits (approximately 100 requests per 5 minutes). The scripts include retry logic but no API key authentication. For heavy usage, obtain an API key from Semantic Scholar.
  • Encoding: JSON files are written with `encoding="utf-8"` for international character support in paper titles and abstracts.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment