Implementation:Kubeflow Pipelines Convert CSV To Parquet Op
Appearance
| Sources | Kubeflow Pipelines |
|---|---|
| Domains | Data_Engineering, ETL |
| Last Updated | 2026-02-13 |
Overview
Reusable KFP component for converting CSV data to Apache Parquet format.
Description
Wrapper Doc. convert_csv_to_apache_parquet_op loaded from remote YAML spec. Uses pandas and pyarrow internally to read CSV and write Parquet.
Usage
Use within a pipeline to convert CSV data to Parquet before Parquet-specific training or processing.
Code Reference
Source Location: Repository: kubeflow/pipelines, File: samples/core/XGBoost/xgboost_sample.py (L7-9 loading, L49-50 invocation)
Signature:
convert_csv_to_apache_parquet_op = components.load_component_from_url(
'https://raw.githubusercontent.com/kubeflow/pipelines/.../components/_converters/ApacheParquet/from_CSV/component.yaml'
)
convert_csv_to_apache_parquet_op(data: CSV) -> output # Parquet data
Import: from kfp import components
I/O Contract
| Direction | Name | Type | Required | Description |
|---|---|---|---|---|
| Input | data | CSV | Yes | Input CSV data |
| Output | output | Parquet | — | Converted Parquet data |
Usage Examples
training_data_parquet = convert_csv_to_apache_parquet_op(
data=training_data_csv
).output
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment