Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Mbzuai oryx Awesome LLM Post training Paper Categorization

From Leeroopedia


Knowledge Sources
Domains Curation, Classification
Last Updated 2026-02-08 07:30 GMT

Overview

An editorial classification process that assigns collected papers to taxonomy categories based on abstract and summary review.

Description

Paper Categorization is the core intellectual task in awesome-list curation. For each paper in the collected corpus, a human curator reads the abstract and TL;DR summary to determine which taxonomy category the paper belongs to. The curator applies selection criteria including topical relevance, venue quality, recency (focusing on 2022-2025 publications), and citation impact.

Papers may be assigned to one or more categories if they span multiple topics. The process requires domain expertise to distinguish between closely related categories (e.g., distinguishing "Reward Learning" from "Policy Optimization" papers).

Usage

Use this principle after the taxonomy has been defined and the paper corpus has been loaded. It requires:

  • A defined taxonomy with clear category boundaries
  • A loaded paper corpus with metadata (title, abstract, TL;DR, venue, year)
  • Domain expertise to make accurate categorization judgments

Theoretical Basis

The categorization process applies a classification rubric to each paper:

Pseudo-code Logic:

# Abstract categorization process (NOT real implementation)
for paper in corpus:
    # Apply selection criteria
    if not meets_quality_threshold(paper):
        continue  # Skip low-quality papers
    if not is_recent(paper, min_year=2022):
        continue  # Focus on recent work

    # Classify based on content
    categories = classify_by_content(
        abstract=paper["Abstract"],
        summary=paper["TL;DR"],
        venue=paper["Venue"]
    )

    for category in categories:
        assign_to_section(paper, category)

Selection criteria:

  • Relevance: Paper topic falls within LLM post-training scope
  • Venue quality: Published in recognized venues (NeurIPS, ACL, ICLR, ICML, arXiv)
  • Recency: Focus on 2022-2025 publications
  • Impact: Consideration of citation counts and field significance

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment