Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Ucbepic Docetl LLM Powered Document Ranking

From Leeroopedia


Knowledge Sources
Domains LLM_Data_Processing, Information_Retrieval
Last Updated 2026-02-08 00:00 GMT

Overview

Prompt-guided document ranking uses a multi-phase approach combining initial ordering (via embeddings or LLM Likert ratings) with sliding-window LLM refinement to order documents by criteria specified in natural language prompts.

Theoretical Basis

Ordering a collection of documents by subjective or complex criteria -- such as "most relevant to climate policy" or "most actionable for a product manager" -- cannot be solved by simple sorting on numeric fields. It requires semantic understanding that only an LLM can provide. However, having an LLM perform all O(n squared) pairwise comparisons is prohibitively expensive. DocETL's rank operation draws on ideas from the human-powered sort literature to achieve high-quality rankings with a bounded LLM call budget.

The operation proceeds in two phases. The initial ordering phase produces a coarse ranking using one of three methods: (1) embedding similarity to the ranking criteria, which is fast and cheap but imprecise; (2) Likert-scale LLM ratings where each document is rated 1-7 against the criteria in parallel batches, providing more nuanced initial ordering; or (3) calibrated embedding sort that uses a small LLM-ranked sample to calibrate embedding-based ordering. The refinement phase then applies a sliding window approach: windows of configurable size move across the ranking, and within each window the LLM selects the top-K items ("picky windows"). Selected items are promoted to the front of the window, progressively refining the ranking. The total number of LLM calls in the refinement phase is bounded by a configurable budget parameter.

This two-phase design achieves a favorable trade-off: the initial ordering places most documents approximately correctly at low cost, while the sliding window refinement uses expensive LLM calls only where they have the most impact -- disambiguating items that are close in quality. The approach is particularly effective when only the top-K items matter, as the refinement can terminate early once the top positions are stable.

Key Design Decisions

Decision Choice Rationale
Initial ordering Three strategies: embedding similarity, Likert LLM ratings, or calibrated embedding Provides a cost-quality spectrum; embedding is cheapest, Likert is most accurate, calibrated embedding balances both
Refinement approach Sliding picky windows with bounded LLM call budget Concentrates expensive LLM calls where they have the most impact; budget parameter gives direct cost control
Direction support Configurable ascending or descending ordering Supports both "best first" and "worst first" use cases with the same underlying algorithm

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment