Implementation:Mbzuai oryx Awesome LLM Post training Abstract Review Categorization
| Knowledge Sources | |
|---|---|
| Domains | Curation, Classification |
| Last Updated | 2026-02-08 07:30 GMT |
Overview
Concrete pattern for manually categorizing papers by reviewing abstracts and TL;DR summaries from the collected corpus.
Description
This is a Pattern Doc documenting the manual editorial process of reviewing paper metadata from assets/2000+papers.json and assigning each selected paper to one or more taxonomy categories defined in README.md. The curator reads each paper's abstract and TL;DR summary, evaluates it against selection criteria, and places it in the appropriate README section (lines 51-291).
There is no programmatic API for this step; it is entirely a human editorial process requiring domain expertise in LLM post-training research.
Usage
Perform this categorization after loading the paper corpus (via json.load) and defining the taxonomy (section structure). Apply the selection criteria consistently across all papers.
Code Reference
Source Location
- Repository: Awesome-LLM-Post-training
- Source data: assets/2000+papers.json
- Target sections: README.md:L51-291
Interface Specification
# Manual categorization process interface (NOT executable code)
# Input: paper metadata dict
paper = {
"Title": str, # Paper title
"Authors": str, # Comma-separated authors
"Abstract": str, # Full abstract text
"TL;DR": str, # Auto-generated summary
"Publication Year": int,
"Venue (Conference/Journal)": str,
"Link": str # URL to paper
}
# Selection criteria applied by curator:
# 1. Relevance: Is this paper about LLM post-training?
# 2. Quality: Is it published in a recognized venue?
# 3. Recency: Is it from 2022-2025?
# 4. Impact: Is it cited or influential?
# Output: category assignment
assigned_categories = ["Reward Learning", "Human Feedback"] # one or more
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| Paper corpus | dict | Yes | 2000+ papers from assets/2000+papers.json with full metadata |
| Taxonomy | Markdown structure | Yes | Section headers from README.md defining category boundaries |
| Domain expertise | Editorial | Yes | Knowledge of LLM post-training research landscape |
Outputs
| Name | Type | Description |
|---|---|---|
| Category assignments | Mapping | Each selected paper mapped to one or more taxonomy categories |
Usage Examples
Categorization Decision Examples
Example 1: "Training Language Models with Language Feedback at Scale"
Abstract mentions: human feedback, reward model, language feedback
Decision: Assign to "Reward Learning" > "Human Feedback"
Example 2: "Direct Preference Optimization: Your Language Model is Secretly a Reward Model"
Abstract mentions: preference optimization, bypass reward model
Decision: Assign to "Policy Optimization"
Example 3: "MCTS-enhanced LLM Reasoning via Tree Search"
Abstract mentions: Monte Carlo Tree Search, reasoning
Decision: Assign to "LLMs for Reasoning & Decision-Making" > "Planning"