Principle:Eventual Inc Daft Column Transformation
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Data_Transformation |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Technique for adding or replacing columns in a DataFrame using computed expressions.
Description
Column transformation appends a new column (or replaces an existing one) by evaluating an expression over the existing data. This is the primary mechanism for applying UDFs, built-in functions, and arithmetic operations to create derived columns. If the column name matches an existing column, the old column is replaced; otherwise, a new column is appended to the schema. The operation is equivalent to a SELECT of all existing columns plus the new expression aliased to the given name.
Usage
Use column transformation when you need to add computed columns or apply transformations to existing data. Common scenarios include feature engineering, data cleaning (e.g., normalizing values), deriving new metrics from existing columns, and applying user-defined functions.
Theoretical Basis
Column transformation implements a projection extension operation:
Pseudocode:
with_column(df, name, expr):
return SELECT df.*, expr AS name FROM df
Semantics:
- If 'name' exists in df.schema:
Replace column 'name' with evaluated 'expr'
- If 'name' does not exist:
Append new column 'name' with evaluated 'expr'
Expression Evaluation:
- expr is evaluated row-wise over the existing columns
- Can reference any existing column
- Supports arithmetic, string ops, UDFs, conditionals
- Result must have same number of rows as input
This operation extends the relational projection by preserving all existing columns and adding (or replacing) exactly one derived column.