Workflow:Mbzuai oryx Awesome LLM Post training Awesome List Curation
| Knowledge Sources | |
|---|---|
| Domains | Academic_Research, Knowledge_Management, Technical_Writing |
| Last Updated | 2025-02-28 14:00 GMT |
Overview
End-to-end process for curating and organizing collected academic papers into a structured, categorized awesome-list README for the LLM post-training research community.
Description
This workflow transforms a raw corpus of collected papers (produced by the Deep Paper Collection workflow) into a well-organized, community-facing README document. The process involves reviewing collected paper metadata, assigning papers to topical categories defined by the companion survey paper's taxonomy, formatting each entry with consistent metadata (title, date, link, venue badge), and maintaining the README structure as the canonical resource list.
Goal: A curated, categorized Markdown README with 200+ papers organized by research topic, each with standardized metadata links and venue badges.
Scope: From the raw JSON paper corpus and survey paper taxonomy to a published, community-maintained awesome-list.
Strategy: Uses the survey paper's taxonomy (fine-tuning, RL, test-time scaling) as the organizational framework, maps collected papers to categories based on their abstracts and TL;DR fields, and applies consistent Markdown formatting with badge indicators for venue and date.
Usage
Execute this workflow when the paper collection corpus has been updated (via the Deep Paper Collection workflow) and the curated README needs to reflect new papers. Also execute when the survey paper's taxonomy has been revised and existing papers need re-categorization. This workflow is the bridge between automated data collection and the human-readable resource list that the community consumes.
Execution Steps
Step 1: Review Collected Paper Corpus
Load and examine the JSON dataset produced by the Deep Paper Collection workflow. Assess the total number of papers, identify newly added entries since the last curation pass, and flag papers with missing or incomplete metadata (no abstract, no venue, unknown year).
Key considerations:
- Compare against the existing README to identify papers not yet curated
- Papers with missing abstracts may need manual lookup for categorization
- Duplicate entries across crawl runs should be identified and resolved
Step 2: Define Category Taxonomy
Establish the topical categories based on the companion survey paper's structure. The taxonomy for this repository includes: Surveys, LLMs-in-RL, Reward Learning, Policy Optimization, MCTS/Tree Search, Explainability, Multimodal Agents, Benchmarks/Datasets, Reasoning and Safety, and RL/LLM Fine-Tuning Repositories.
Key considerations:
- Categories should align with the survey paper sections for consistency
- Some papers may fit multiple categories; choose the primary one
- New categories may be needed as the research landscape evolves
Step 3: Categorize Papers
For each uncurated paper in the corpus, read its abstract, TL;DR, and title to determine the most appropriate category. Assign each paper to exactly one primary section of the README taxonomy.
Key considerations:
- Use TL;DR summaries as the quickest signal for categorization
- When ambiguous, prefer the category that reflects the paper's primary contribution
- Flag papers that do not fit any existing category for potential taxonomy expansion
Step 4: Format Paper Entries
Convert each categorized paper's metadata into the standardized README entry format. This includes the paper title as a link, publication date, and a venue/date badge. Entries within each category are ordered by publication date (newest first).
Key considerations:
- Use consistent badge formats for arXiv, conference proceedings, and journals
- Ensure all links point to the correct paper URL (arXiv, OpenReview, ACL Anthology)
- Date badges follow the format: venue-year or arXiv-YYYY.MM
Step 5: Update and Publish README
Insert the newly formatted entries into the appropriate sections of the README. Verify that the table of contents reflects all current sections, that section headers are consistent, and that no entries are duplicated across sections. Commit the updated README to the repository.
Key considerations:
- Maintain the existing table of contents structure at the top
- Preserve the repository's header section (badges, description, citation)
- Review for formatting consistency before committing
- Community contributions via pull requests should follow the same formatting