Principle:Eventual Inc Daft Struct And List Access
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Data_Transformation |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Technique for accessing nested fields within struct and list type columns in a dataframe.
Description
Complex data often contains nested structures (structs with named fields) and lists. Accessing specific fields or elements enables extracting relevant data from nested columns without flattening the entire structure. Struct columns contain named fields that can be accessed by key, while list columns contain ordered elements accessible by positional index. This capability is essential when working with semi-structured data such as JSON records, Parquet files with nested schemas, or any data source that produces hierarchically organized columns.
Usage
Use struct and list access when you need to extract specific fields from struct columns or index into list columns. This is particularly useful when working with nested JSON data, Parquet files with complex schemas, or any scenario where columns contain compound types that need to be decomposed for analysis.
Theoretical Basis
Structured data access operates via two primary mechanisms:
- Key-based field extraction (structs): Given a struct column with schema
{field_a: T1, field_b: T2, ...}, accessing by field name returns a column of the corresponding type. This is analogous to attribute access on a record type. - Positional indexing (lists): Given a list column with element type
T, accessing by integer index returns the element at that position. Negative indices count from the end. Slice notation extracts sub-lists.
Pseudocode:
struct_access(column, field_name) -> column[field_name] : T_field
list_access(column, index) -> column[index] : T_element
list_slice(column, start, stop) -> column[start:stop] : List[T_element]
These operations preserve the row-level correspondence of the original DataFrame, producing a new column with the same number of rows as the input.