Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Eventual Inc Daft Struct And List Access

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Data_Transformation
Last Updated 2026-02-08 00:00 GMT

Overview

Technique for accessing nested fields within struct and list type columns in a dataframe.

Description

Complex data often contains nested structures (structs with named fields) and lists. Accessing specific fields or elements enables extracting relevant data from nested columns without flattening the entire structure. Struct columns contain named fields that can be accessed by key, while list columns contain ordered elements accessible by positional index. This capability is essential when working with semi-structured data such as JSON records, Parquet files with nested schemas, or any data source that produces hierarchically organized columns.

Usage

Use struct and list access when you need to extract specific fields from struct columns or index into list columns. This is particularly useful when working with nested JSON data, Parquet files with complex schemas, or any scenario where columns contain compound types that need to be decomposed for analysis.

Theoretical Basis

Structured data access operates via two primary mechanisms:

  • Key-based field extraction (structs): Given a struct column with schema {field_a: T1, field_b: T2, ...}, accessing by field name returns a column of the corresponding type. This is analogous to attribute access on a record type.
  • Positional indexing (lists): Given a list column with element type T, accessing by integer index returns the element at that position. Negative indices count from the end. Slice notation extracts sub-lists.
Pseudocode:
  struct_access(column, field_name) -> column[field_name] : T_field
  list_access(column, index)       -> column[index]      : T_element
  list_slice(column, start, stop)  -> column[start:stop] : List[T_element]

These operations preserve the row-level correspondence of the original DataFrame, producing a new column with the same number of rows as the input.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment