Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Eventual Inc Daft DataFrame Join

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Data_Analysis
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for joining two DataFrames on shared key columns provided by the Daft library.

Description

The join method on a Daft DataFrame performs a column-wise join with another DataFrame, similar to a SQL JOIN. It supports seven join types: inner, left, right, outer, anti, semi, and cross. Join keys can be specified with on (when both sides share the same key names) or with separate left_on and right_on parameters. The join strategy can be explicitly set to hash, sort_merge, or broadcast, or left as None for automatic optimization. When column name collisions occur, conflicting right-side columns are prefixed (default "right.") or suffixed. Sort-merge joins only support inner joins, and broadcast joins do not support outer joins.

Usage

Use this method on a DataFrame when you need to combine data from two DataFrames based on matching key columns or when performing a cross join.

Code Reference

Source Location

  • Repository: Daft
  • File: daft/dataframe/dataframe.py
  • Lines: L2853-2945

Signature

def join(
    self,
    other: "DataFrame",
    on: list[ColumnInputType] | ColumnInputType | None = None,
    left_on: list[ColumnInputType] | ColumnInputType | None = None,
    right_on: list[ColumnInputType] | ColumnInputType | None = None,
    how: Literal["inner", "left", "right", "outer", "anti", "semi", "cross"] = "inner",
    strategy: Literal["hash", "sort_merge", "broadcast"] | None = None,
    prefix: str | None = None,
    suffix: str | None = None,
) -> "DataFrame"

Import

# Method on DataFrame, no separate import needed
joined_df = df1.join(df2, on="key")

I/O Contract

Inputs

Name Type Required Description
other DataFrame Yes The right DataFrame to join with
on ColumnInputType | None No Key(s) to join on when both sides share the same column names
left_on ColumnInputType | None No Key(s) to join on from the left DataFrame
right_on ColumnInputType | None No Key(s) to join on from the right DataFrame
how Literal["inner","left","right","outer","anti","semi","cross"] No Join type; defaults to "inner"
strategy None No Join algorithm; defaults to None (automatic)
prefix None No Prefix for conflicting right column names; defaults to "right."
suffix None No Suffix for conflicting right column names

Outputs

Name Type Description
return DataFrame The joined DataFrame containing columns from both inputs

Usage Examples

Basic Usage

import daft
from daft import col

df1 = daft.from_pydict({"a": ["w", "x", "y"], "b": [1, 2, 3]})
df2 = daft.from_pydict({"a": ["x", "y", "z"], "b": [20, 30, 40]})

# Inner join with separate left_on and right_on
joined_df = df1.join(df2, left_on=df1["a"], right_on=df2["a"])
joined_df.show()

# Left join using shared key column
joined_df = df1.join(df2, on="a", how="left")
joined_df.show()

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment