Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:DataExpert io Data engineer handbook DataFrame Write InsertInto

From Leeroopedia


Overview

This page documents the DataFrame write with insertInto pattern used in the Data Engineer Handbook repository. This is an external PySpark API for persisting DataFrame contents to an existing catalog-managed table using column-position-based insertion with overwrite semantics.

Type

Wrapper Doc (PySpark external API)

Source

players_scd_job.py:L53

Signature

output_df.write.mode("overwrite").insertInto(table_name: str) -> None

Import

Implicit on DataFrame objects -- no separate import is required. The write property is available on all pyspark.sql.DataFrame instances.

from pyspark.sql import SparkSession  # DataFrame.write is available implicitly

Inputs / Outputs

Direction Name Type Description
Input DataFrame DataFrame The DataFrame whose contents will be written to the target table
Input write mode str The write mode, e.g., "overwrite" to replace existing data
Input table name str The fully qualified name of the target table (e.g., "catalog.schema.table")
Output (side effect) None Data is written to the specified table; no value is returned

Usage Example

# After performing the SCD transformation
output_df = do_player_scd_transformation(spark, input_df)

# Write the result to the target table, overwriting existing data
output_df.write.mode("overwrite").insertInto("players_scd")

Key Behavior Notes

  • Column matching is by position, not by name. The DataFrame column order must match the target table schema exactly.
  • overwrite mode replaces all data in the target table (or partition, if partitioning is configured).
  • The target table must already exist in the catalog. Unlike saveAsTable, insertInto does not create the table.

Related Pages

Knowledge Sources

Metadata

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment