Implementation:Eventual Inc Daft DataFrame Join
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Data_Analysis |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for joining two DataFrames on shared key columns provided by the Daft library.
Description
The join method on a Daft DataFrame performs a column-wise join with another DataFrame, similar to a SQL JOIN. It supports seven join types: inner, left, right, outer, anti, semi, and cross. Join keys can be specified with on (when both sides share the same key names) or with separate left_on and right_on parameters. The join strategy can be explicitly set to hash, sort_merge, or broadcast, or left as None for automatic optimization. When column name collisions occur, conflicting right-side columns are prefixed (default "right.") or suffixed. Sort-merge joins only support inner joins, and broadcast joins do not support outer joins.
Usage
Use this method on a DataFrame when you need to combine data from two DataFrames based on matching key columns or when performing a cross join.
Code Reference
Source Location
- Repository: Daft
- File:
daft/dataframe/dataframe.py - Lines: L2853-2945
Signature
def join(
self,
other: "DataFrame",
on: list[ColumnInputType] | ColumnInputType | None = None,
left_on: list[ColumnInputType] | ColumnInputType | None = None,
right_on: list[ColumnInputType] | ColumnInputType | None = None,
how: Literal["inner", "left", "right", "outer", "anti", "semi", "cross"] = "inner",
strategy: Literal["hash", "sort_merge", "broadcast"] | None = None,
prefix: str | None = None,
suffix: str | None = None,
) -> "DataFrame"
Import
# Method on DataFrame, no separate import needed
joined_df = df1.join(df2, on="key")
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| other | DataFrame | Yes | The right DataFrame to join with |
| on | ColumnInputType | None | No | Key(s) to join on when both sides share the same column names |
| left_on | ColumnInputType | None | No | Key(s) to join on from the left DataFrame |
| right_on | ColumnInputType | None | No | Key(s) to join on from the right DataFrame |
| how | Literal["inner","left","right","outer","anti","semi","cross"] | No | Join type; defaults to "inner" |
| strategy | None | No | Join algorithm; defaults to None (automatic) |
| prefix | None | No | Prefix for conflicting right column names; defaults to "right." |
| suffix | None | No | Suffix for conflicting right column names |
Outputs
| Name | Type | Description |
|---|---|---|
| return | DataFrame | The joined DataFrame containing columns from both inputs |
Usage Examples
Basic Usage
import daft
from daft import col
df1 = daft.from_pydict({"a": ["w", "x", "y"], "b": [1, 2, 3]})
df2 = daft.from_pydict({"a": ["x", "y", "z"], "b": [20, 30, 40]})
# Inner join with separate left_on and right_on
joined_df = df1.join(df2, left_on=df1["a"], right_on=df2["a"])
joined_df.show()
# Left join using shared key column
joined_df = df1.join(df2, on="a", how="left")
joined_df.show()