Principle:Mbzuai oryx Awesome LLM Post training Paper Categorization
| Knowledge Sources | |
|---|---|
| Domains | Curation, Classification |
| Last Updated | 2026-02-08 07:30 GMT |
Overview
An editorial classification process that assigns collected papers to taxonomy categories based on abstract and summary review.
Description
Paper Categorization is the core intellectual task in awesome-list curation. For each paper in the collected corpus, a human curator reads the abstract and TL;DR summary to determine which taxonomy category the paper belongs to. The curator applies selection criteria including topical relevance, venue quality, recency (focusing on 2022-2025 publications), and citation impact.
Papers may be assigned to one or more categories if they span multiple topics. The process requires domain expertise to distinguish between closely related categories (e.g., distinguishing "Reward Learning" from "Policy Optimization" papers).
Usage
Use this principle after the taxonomy has been defined and the paper corpus has been loaded. It requires:
- A defined taxonomy with clear category boundaries
- A loaded paper corpus with metadata (title, abstract, TL;DR, venue, year)
- Domain expertise to make accurate categorization judgments
Theoretical Basis
The categorization process applies a classification rubric to each paper:
Pseudo-code Logic:
# Abstract categorization process (NOT real implementation)
for paper in corpus:
# Apply selection criteria
if not meets_quality_threshold(paper):
continue # Skip low-quality papers
if not is_recent(paper, min_year=2022):
continue # Focus on recent work
# Classify based on content
categories = classify_by_content(
abstract=paper["Abstract"],
summary=paper["TL;DR"],
venue=paper["Venue"]
)
for category in categories:
assign_to_section(paper, category)
Selection criteria:
- Relevance: Paper topic falls within LLM post-training scope
- Venue quality: Published in recognized venues (NeurIPS, ACL, ICLR, ICML, arXiv)
- Recency: Focus on 2022-2025 publications
- Impact: Consideration of citation counts and field significance