Workflow:Mbzuai oryx Awesome LLM Post training Awesome List Curation

Knowledge Sources	Awesome-LLM-Post-training LLM Post-Training Survey
Domains	Academic_Research, Knowledge_Management, Technical_Writing
Last Updated	2025-02-28 14:00 GMT

Overview

End-to-end process for curating and organizing collected academic papers into a structured, categorized awesome-list README for the LLM post-training research community.

Description

This workflow transforms a raw corpus of collected papers (produced by the Deep Paper Collection workflow) into a well-organized, community-facing README document. The process involves reviewing collected paper metadata, assigning papers to topical categories defined by the companion survey paper's taxonomy, formatting each entry with consistent metadata (title, date, link, venue badge), and maintaining the README structure as the canonical resource list.

Goal: A curated, categorized Markdown README with 200+ papers organized by research topic, each with standardized metadata links and venue badges.

Scope: From the raw JSON paper corpus and survey paper taxonomy to a published, community-maintained awesome-list.

Strategy: Uses the survey paper's taxonomy (fine-tuning, RL, test-time scaling) as the organizational framework, maps collected papers to categories based on their abstracts and TL;DR fields, and applies consistent Markdown formatting with badge indicators for venue and date.

Usage

Execute this workflow when the paper collection corpus has been updated (via the Deep Paper Collection workflow) and the curated README needs to reflect new papers. Also execute when the survey paper's taxonomy has been revised and existing papers need re-categorization. This workflow is the bridge between automated data collection and the human-readable resource list that the community consumes.

Execution Steps

Step 1: Review Collected Paper Corpus

Load and examine the JSON dataset produced by the Deep Paper Collection workflow. Assess the total number of papers, identify newly added entries since the last curation pass, and flag papers with missing or incomplete metadata (no abstract, no venue, unknown year).

Key considerations:

Compare against the existing README to identify papers not yet curated
Papers with missing abstracts may need manual lookup for categorization
Duplicate entries across crawl runs should be identified and resolved

Step 2: Define Category Taxonomy

Establish the topical categories based on the companion survey paper's structure. The taxonomy for this repository includes: Surveys, LLMs-in-RL, Reward Learning, Policy Optimization, MCTS/Tree Search, Explainability, Multimodal Agents, Benchmarks/Datasets, Reasoning and Safety, and RL/LLM Fine-Tuning Repositories.

Key considerations:

Categories should align with the survey paper sections for consistency
Some papers may fit multiple categories; choose the primary one
New categories may be needed as the research landscape evolves

Step 3: Categorize Papers

For each uncurated paper in the corpus, read its abstract, TL;DR, and title to determine the most appropriate category. Assign each paper to exactly one primary section of the README taxonomy.

Key considerations:

Use TL;DR summaries as the quickest signal for categorization
When ambiguous, prefer the category that reflects the paper's primary contribution
Flag papers that do not fit any existing category for potential taxonomy expansion

Step 4: Format Paper Entries

Convert each categorized paper's metadata into the standardized README entry format. This includes the paper title as a link, publication date, and a venue/date badge. Entries within each category are ordered by publication date (newest first).

Key considerations:

Use consistent badge formats for arXiv, conference proceedings, and journals
Ensure all links point to the correct paper URL (arXiv, OpenReview, ACL Anthology)
Date badges follow the format: venue-year or arXiv-YYYY.MM

Step 5: Update and Publish README

Insert the newly formatted entries into the appropriate sections of the README. Verify that the table of contents reflects all current sections, that section headers are consistent, and that no entries are duplicated across sections. Commit the updated README to the repository.

Key considerations:

Maintain the existing table of contents structure at the top
Preserve the repository's header section (badges, description, citation)
Review for formatting consistency before committing
Community contributions via pull requests should follow the same formatting

Execution Diagram

GitHub URL

Workflow Repository