Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ucbepic Docetl RankOperation Execute

From Leeroopedia


Knowledge Sources
Domains Data_Processing, Document_Ranking
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for ranking documents by quality or relevance using multiple LLM-based and embedding-based evaluation strategies, provided by DocETL.

Description

The RankOperation class extends BaseOperation to rank documents according to user-defined criteria and direction (ascending or descending). It implements a multi-phase approach: first an initial ordering phase using either embedding similarity, Likert-scale LLM ratings, or calibrated embeddings, followed by a refinement phase using "picky" sliding windows where the LLM selects top items from progressively positioned windows. Each document receives a _rank field indicating its position in the final ordering.

Usage

Use this operation when you need to order documents by subjective quality, relevance, or any criterion that requires semantic understanding. Typical scenarios include ranking search results by relevance, prioritizing support tickets by urgency, ordering research papers by novelty, or selecting the best candidates from a pool based on complex criteria.

Code Reference

Source Location

Signature

class RankOperation(BaseOperation):
    class schema(BaseOperation.schema):
        type: str = "order"
        prompt: str
        input_keys: list[str] = Field(default_factory=list)
        direction: Literal["asc", "desc"]
        model: str | None = None
        embedding_model: str | None = None
        batch_size: int = Field(10, gt=0)
        initial_ordering_method: Literal["embedding", "likert", "calibrated_embedding"] = "embedding"
        k: int | None = Field(None, gt=0)
        rerank_call_budget: int = Field(100, gt=0)
        num_top_items_per_window: int = Field(3, gt=0)
        overlap_fraction: float = Field(0.5, ge=0, le=1)
        timeout: int | None = Field(None, gt=0)
        num_calibration_docs: int = Field(10, gt=0)
        verbose: bool = False
        litellm_completion_kwargs: dict[str, Any] = Field(default_factory=dict)

    def _batch_rank_documents(self, batch, criteria, direction, model, ...) -> tuple[list[int], float]: ...
    def _execute_comparison_qurk(self, input_data, sample=False) -> tuple[list[dict], float]: ...
    def _execute_rating_embedding_qurk(self, input_data) -> tuple[list[dict], float]: ...
    def _execute_sliding_window_qurk(self, input_data, ...) -> tuple[list[dict], float]: ...
    def _execute_likert_rating_qurk(self, input_data) -> tuple[list[dict], float]: ...
    def _execute_picky_window(self, window_docs, num_top_items) -> list[int]: ...
    def _execute_calibrated_embedding_sort(self, input_data) -> tuple[list[dict], float]: ...
    def execute(self, input_data: list[dict]) -> tuple[list[dict], float]: ...

Import

from docetl.operations.rank import RankOperation

I/O Contract

Inputs

Name Type Required Description
input_data List[Dict] Yes Documents to rank
prompt str Yes Ranking criteria description used by the LLM
direction str Yes Ranking direction: "asc" (ascending) or "desc" (descending)
input_keys List[str] No Keys to extract from documents for ranking (defaults to all keys)
initial_ordering_method str No Method for initial ordering: "embedding", "likert", or "calibrated_embedding" (default "embedding")
k int No Number of top elements to focus on (default: all documents)
rerank_call_budget int No Number of LLM calls for sliding window refinement (default 100)
batch_size int No Size of each comparison window (default 10)
num_top_items_per_window int No Number of items the LLM picks per window (default 3)
overlap_fraction float No Overlap fraction between windows (default 0.5)
model str No LLM model for comparisons (defaults to pipeline default)
embedding_model str No Model for embedding-based initial ordering

Outputs

Name Type Description
output Tuple[List[Dict], float] Ranked documents (each with a _rank field) and total cost

Usage Examples

# YAML pipeline configuration for ranking
operations:
  - name: rank_papers
    type: order
    prompt: "Rank by novelty and significance of the research contribution"
    input_keys:
      - title
      - abstract
    direction: desc
    initial_ordering_method: likert
    k: 50
    rerank_call_budget: 20
    batch_size: 10
    model: "gpt-4o-mini"
# Python API usage
from docetl.operations.rank import RankOperation

config = {
    "name": "rank_tickets",
    "type": "order",
    "prompt": "Rank by urgency and customer impact",
    "input_keys": ["subject", "description"],
    "direction": "desc",
    "initial_ordering_method": "embedding",
    "k": 20,
    "rerank_call_budget": 50,
}
rank_op = RankOperation(runner, config, default_model, max_threads)
ranked_results, cost = rank_op.execute(input_data)
# Each result now has a "_rank" field (1 = highest ranked)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment