Implementation:Eventual Inc Daft DataFrame Sort
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Data_Analysis |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for globally sorting DataFrame rows by column values provided by the Daft library.
Description
The sort method on a Daft DataFrame performs a global sort of all rows based on one or more columns or expressions. Each sort column can have an independent descending flag and null placement configuration. When nulls_first is not specified, it defaults to matching the desc parameter (nulls first when descending, nulls last when ascending). Since this is a global sort across all partitions, it requires an expensive repartition operation and can be slow on large datasets.
Usage
Use this method on a DataFrame when you need fully ordered results for display, top-N queries, reporting, or operations that require sorted input.
Code Reference
Source Location
- Repository: Daft
- File:
daft/dataframe/dataframe.py - Lines: L2548-2642
Signature
def sort(
self,
by: ColumnInputType | list[ColumnInputType],
desc: bool | list[bool] = False,
nulls_first: bool | list[bool] | None = None,
) -> "DataFrame"
Import
# Method on DataFrame, no separate import needed
sorted_df = df.sort("col", desc=True)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| by | list[ColumnInputType] | Yes | Column(s) to sort by; can be string column names, expressions, or a list of either |
| desc | list[bool] | No | Sort descending; can be a single bool or a list matching the number of sort columns; defaults to False |
| nulls_first | list[bool] | None | No | Place nulls first; defaults to None (nulls treated as greatest value, matching desc behavior) |
Outputs
| Name | Type | Description |
|---|---|---|
| return | DataFrame | A new DataFrame with rows sorted according to the specified columns and directions |
Usage Examples
Basic Usage
import daft
df = daft.from_pydict({"x": [3, 2, 1], "y": [6, 4, 5]})
# Sort by expression (x + y) ascending
sorted_df = df.sort(df["x"] + df["y"])
sorted_df.show()
# Multi-column sort with different directions
df = daft.from_pydict({"x": [1, 2, 1, 2], "y": [9, 8, 7, 6]})
sorted_df = df.sort(["x", "y"], [True, False])
sorted_df.show()
# Sort with explicit null positioning
df = daft.from_pydict({"x": [1, 2, None], "y": [9, 8, None]})
sorted_df = df.sort(["x", "y"], [True, False], nulls_first=[True, True])
sorted_df.show()