Implementation:Ucbepic Docetl RankOperation Execute

Knowledge Sources	Ucbepic_Docetl DocETL Docs
Domains	Data_Processing, Document_Ranking
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete tool for ranking documents by quality or relevance using multiple LLM-based and embedding-based evaluation strategies, provided by DocETL.

Description

The RankOperation class extends BaseOperation to rank documents according to user-defined criteria and direction (ascending or descending). It implements a multi-phase approach: first an initial ordering phase using either embedding similarity, Likert-scale LLM ratings, or calibrated embeddings, followed by a refinement phase using "picky" sliding windows where the LLM selects top items from progressively positioned windows. Each document receives a _rank field indicating its position in the final ordering.

Usage

Use this operation when you need to order documents by subjective quality, relevance, or any criterion that requires semantic understanding. Typical scenarios include ranking search results by relevance, prioritizing support tickets by urgency, ordering research papers by novelty, or selecting the best candidates from a pool based on complex criteria.

Code Reference

Source Location

Repository: Ucbepic_Docetl
File: docetl/operations/rank.py
Lines: 1-1084

Signature

class RankOperation(BaseOperation):
    class schema(BaseOperation.schema):
        type: str = "order"
        prompt: str
        input_keys: list[str] = Field(default_factory=list)
        direction: Literal["asc", "desc"]
        model: str | None = None
        embedding_model: str | None = None
        batch_size: int = Field(10, gt=0)
        initial_ordering_method: Literal["embedding", "likert", "calibrated_embedding"] = "embedding"
        k: int | None = Field(None, gt=0)
        rerank_call_budget: int = Field(100, gt=0)
        num_top_items_per_window: int = Field(3, gt=0)
        overlap_fraction: float = Field(0.5, ge=0, le=1)
        timeout: int | None = Field(None, gt=0)
        num_calibration_docs: int = Field(10, gt=0)
        verbose: bool = False
        litellm_completion_kwargs: dict[str, Any] = Field(default_factory=dict)

    def _batch_rank_documents(self, batch, criteria, direction, model, ...) -> tuple[list[int], float]: ...
    def _execute_comparison_qurk(self, input_data, sample=False) -> tuple[list[dict], float]: ...
    def _execute_rating_embedding_qurk(self, input_data) -> tuple[list[dict], float]: ...
    def _execute_sliding_window_qurk(self, input_data, ...) -> tuple[list[dict], float]: ...
    def _execute_likert_rating_qurk(self, input_data) -> tuple[list[dict], float]: ...
    def _execute_picky_window(self, window_docs, num_top_items) -> list[int]: ...
    def _execute_calibrated_embedding_sort(self, input_data) -> tuple[list[dict], float]: ...
    def execute(self, input_data: list[dict]) -> tuple[list[dict], float]: ...

Import

from docetl.operations.rank import RankOperation

I/O Contract

Inputs

Name	Type	Required	Description
input_data	List[Dict]	Yes	Documents to rank
prompt	str	Yes	Ranking criteria description used by the LLM
direction	str	Yes	Ranking direction: "asc" (ascending) or "desc" (descending)
input_keys	List[str]	No	Keys to extract from documents for ranking (defaults to all keys)
initial_ordering_method	str	No	Method for initial ordering: "embedding", "likert", or "calibrated_embedding" (default "embedding")
k	int	No	Number of top elements to focus on (default: all documents)
rerank_call_budget	int	No	Number of LLM calls for sliding window refinement (default 100)
batch_size	int	No	Size of each comparison window (default 10)
num_top_items_per_window	int	No	Number of items the LLM picks per window (default 3)
overlap_fraction	float	No	Overlap fraction between windows (default 0.5)
model	str	No	LLM model for comparisons (defaults to pipeline default)
embedding_model	str	No	Model for embedding-based initial ordering

Outputs

Name	Type	Description
output	Tuple[List[Dict], float]	Ranked documents (each with a _rank field) and total cost

Usage Examples

# YAML pipeline configuration for ranking
operations:
  - name: rank_papers
    type: order
    prompt: "Rank by novelty and significance of the research contribution"
    input_keys:
      - title
      - abstract
    direction: desc
    initial_ordering_method: likert
    k: 50
    rerank_call_budget: 20
    batch_size: 10
    model: "gpt-4o-mini"

# Python API usage
from docetl.operations.rank import RankOperation

config = {
    "name": "rank_tickets",
    "type": "order",
    "prompt": "Rank by urgency and customer impact",
    "input_keys": ["subject", "description"],
    "direction": "desc",
    "initial_ordering_method": "embedding",
    "k": 20,
    "rerank_call_budget": 50,
}
rank_op = RankOperation(runner, config, default_model, max_threads)
ranked_results, cost = rank_op.execute(input_data)
# Each result now has a "_rank" field (1 = highest ranked)

Related Pages

Principle:Ucbepic_Docetl_LLM_Powered_Document_Ranking

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment