Implementation:DataExpert io Data engineer handbook DataFrame Write InsertInto
Appearance
Overview
This page documents the DataFrame write with insertInto pattern used in the Data Engineer Handbook repository. This is an external PySpark API for persisting DataFrame contents to an existing catalog-managed table using column-position-based insertion with overwrite semantics.
Type
Wrapper Doc (PySpark external API)
Source
players_scd_job.py:L53
Signature
output_df.write.mode("overwrite").insertInto(table_name: str) -> None
Import
Implicit on DataFrame objects -- no separate import is required. The write property is available on all pyspark.sql.DataFrame instances.
from pyspark.sql import SparkSession # DataFrame.write is available implicitly
Inputs / Outputs
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | DataFrame | DataFrame | The DataFrame whose contents will be written to the target table |
| Input | write mode | str | The write mode, e.g., "overwrite" to replace existing data
|
| Input | table name | str | The fully qualified name of the target table (e.g., "catalog.schema.table")
|
| Output | (side effect) | None | Data is written to the specified table; no value is returned |
Usage Example
# After performing the SCD transformation
output_df = do_player_scd_transformation(spark, input_df)
# Write the result to the target table, overwriting existing data
output_df.write.mode("overwrite").insertInto("players_scd")
Key Behavior Notes
- Column matching is by position, not by name. The DataFrame column order must match the target table schema exactly.
- overwrite mode replaces all data in the target table (or partition, if partitioning is configured).
- The target table must already exist in the catalog. Unlike
saveAsTable,insertIntodoes not create the table.
Related Pages
- Principle:DataExpert_io_Data_engineer_handbook_DataFrame_Write_To_Table
- Environment:DataExpert_io_Data_engineer_handbook_Spark_Iceberg_Docker_Environment
Knowledge Sources
Metadata
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment