Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Testtimescaling Testtimescaling github io Literature Review Screening

From Leeroopedia


Type Pattern Doc (manual human process)
Source N/A (human activity)
Domains Research_Methodology, Academic_Survey
Last Updated 2026-02-14

Overview

A step-by-step screening process that a contributor follows to evaluate a candidate paper for inclusion in the test-time scaling survey.

Description

This pattern documents the interface and decision process that a human contributor follows when screening a candidate paper. It is not a library API or automated tool; it is a structured manual procedure that ensures consistent evaluation across different contributors and papers.

The screening workflow proceeds through six sequential steps:

  1. Find candidate paper: Identify the paper through arXiv browsing, Semantic Scholar alerts, conference proceedings, or community suggestions.
  2. Read abstract and methodology: Review the paper's abstract, introduction, and methodology sections to understand its core contribution.
  3. Check test-time scaling relevance: Determine whether the paper addresses computation scaling at inference time in Large Language Models. Papers about training-time scaling, non-LLM models, or unrelated topics are excluded.
  4. Check taxonomy fit: Verify that the paper can be classified within the What/How/Where/How Well taxonomy. The paper should fit at least the What (scaling strategy) and How (method category) dimensions.
  5. Make inclusion decision: Based on steps 3 and 4, decide to include or exclude the paper.
  6. Extract arXiv ID: If the paper is included, record the arXiv ID in the standard format (XXXX.XXXXX). This identifier is used in all downstream steps.

Usage

Apply this screening process to every candidate paper before proceeding with taxonomy classification or any other downstream steps. The process is designed to be completed in 5-15 minutes per paper for an experienced reviewer familiar with the test-time scaling literature.

Code Reference

Source Location

This is a human-driven process with no source code. The process is defined by the screening criteria documented here and in the survey's taxonomy (see README.md:L47-73 in the repository).

Interface Specification

The screening interface follows a decision-tree pattern:

SCREENING INTERFACE
====================

Input:
  - candidate_paper: {
      arxiv_url: string (e.g., "https://arxiv.org/abs/2503.24235"),
      title: string,
      abstract: string
    }

Process:
  1. READ abstract and methodology
  2. EVALUATE test-time scaling relevance:
     - Does paper address LLMs? → YES/NO
     - Does paper involve inference-time computation? → YES/NO
     - If both NO → EXCLUDE
  3. EVALUATE taxonomy fit:
     - Can paper be classified under "What to Scale"? → YES/NO
     - Can paper be classified under "How to Scale"? → YES/NO
     - If both NO → EXCLUDE
  4. DECISION: INCLUDE or EXCLUDE

Output:
  - decision: "include" | "exclude"
  - arxiv_id: string (format "XXXX.XXXXX") | null
  - rationale: string (brief justification)

Import

No imports required. This is a manual process performed by a human contributor.

I/O Contract

Inputs

Parameter Type Required Description
candidate_paper Paper reference Yes A reference to the paper being evaluated, typically an arXiv URL, title, and abstract
arxiv_url String No Direct arXiv link if available (e.g., https://arxiv.org/abs/XXXX.XXXXX)
title String Yes The paper's title for identification
abstract String Yes The paper's abstract for initial relevance screening

Outputs

Output Type Description
decision String Either "include" or "exclude"
arxiv_id String or null The arXiv identifier in format XXXX.XXXXX, only if decision is "include"
rationale String Brief justification for the decision (for auditability)

Usage Examples

Example 1: Paper that passes screening

Candidate:
  Title: "Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters"
  URL: https://arxiv.org/abs/2408.03314
  Abstract: "...we study the scaling of inference-time computation in LLMs..."

Step 1: Found via arXiv cs.CL new submissions
Step 2: Abstract discusses inference-time compute scaling, proposes compute-optimal strategies
Step 3: Relevance check:
  - Addresses LLMs? YES
  - Involves test-time computation? YES
Step 4: Taxonomy fit:
  - What to Scale: Sequential (iterative refinement approach)
  - How to Scale: VER (verification-based), SEA (search-based)
Step 5: Decision → INCLUDE
Step 6: arXiv ID → 2408.03314

Example 2: Paper that fails screening

Candidate:
  Title: "Efficient Training of Large Language Models on Distributed Systems"
  URL: https://arxiv.org/abs/YYYY.YYYYY
  Abstract: "...we present a method for reducing training time of LLMs using distributed computing..."

Step 1: Found via Semantic Scholar recommendation
Step 2: Abstract discusses training-time efficiency, not inference-time
Step 3: Relevance check:
  - Addresses LLMs? YES
  - Involves test-time computation? NO (training-time only)
Step 4: N/A (already excluded)
Step 5: Decision → EXCLUDE
Step 6: N/A
Rationale: Paper addresses training-time scaling, not test-time scaling.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment