Principle:PrefectHQ Prefect DataFrame Transformation
| Metadata | |
|---|---|
| Sources | pandas json_normalize |
| Domains | ETL, Data_Engineering |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
A data transformation pattern that converts raw nested JSON records into flat, analytics-ready tabular structures using pandas normalization.
Description
DataFrame Transformation is the "Transform" phase of ETL pipelines. It takes raw, nested JSON data (typically from API responses) and normalizes it into a flat tabular format suitable for analysis, reporting, or storage. The key operation is json_normalize which recursively flattens nested dictionaries, and column selection which extracts only the fields needed for downstream analysis.
Usage
Use this pattern when raw API responses contain nested JSON that needs to be flattened into a tabular format for analysis, CSV export, database loading, or BI tool consumption.
Theoretical Basis
Data normalization converts hierarchical/nested data into relational form. The process:
- Flatten nested structures (json_normalize)
- Select relevant columns
- Type coercion if needed
Pseudocode:
records = flatten(raw_pages)
df = normalize(records)
df = select_columns(df, desired_fields)