Principle:Sdv dev SDV Local Data IO
| Knowledge Sources | |
|---|---|
| Domains | Data_IO, File_Handling |
| Last Updated | 2026-02-14 19:00 GMT |
Overview
Principle that defines how tabular data is read from and written to local file formats while preserving data fidelity.
Description
Local Data IO is the practice of reading multi-table datasets from file-based sources (CSV folders, Excel workbooks) into in-memory representations (dictionaries of DataFrames) and writing synthetic or processed data back to those formats. Key concerns include preserving data types (e.g. leading zeros in string-encoded numeric columns), supporting configurable read/write parameters, handling multiple write modes (create, overwrite, append), and integrating with metadata detection for automatic schema inference.
Usage
Apply this principle when building data ingestion or export pipelines for synthetic data generation. The local IO layer sits between raw file storage and the in-memory data representations consumed by synthesizers and metadata detectors.
Theoretical Basis
Local Data IO follows the Handler pattern: a base class defines the interface (read, write, create_metadata), and concrete handlers (CSVHandler, ExcelHandler) implement format-specific logic. This separation allows adding new file format handlers without modifying existing code.
Pseudo-code:
# Abstract pattern
handler = FormatHandler()
data = handler.read(source) # Format-specific deserialization
metadata = handler.create_metadata(data) # Automatic schema inference
handler.write(synthetic_data, target) # Format-specific serialization