Implementation:Kubeflow Pipelines Convert CSV To Parquet Op

Sources	Kubeflow Pipelines
Domains	Data_Engineering, ETL
Last Updated	2026-02-13

Overview

Reusable KFP component for converting CSV data to Apache Parquet format.

Description

Wrapper Doc. convert_csv_to_apache_parquet_op loaded from remote YAML spec. Uses pandas and pyarrow internally to read CSV and write Parquet.

Usage

Use within a pipeline to convert CSV data to Parquet before Parquet-specific training or processing.

Code Reference

Source Location: Repository: kubeflow/pipelines, File: samples/core/XGBoost/xgboost_sample.py (L7-9 loading, L49-50 invocation)

Signature:

convert_csv_to_apache_parquet_op = components.load_component_from_url(
    'https://raw.githubusercontent.com/kubeflow/pipelines/.../components/_converters/ApacheParquet/from_CSV/component.yaml'
)
convert_csv_to_apache_parquet_op(data: CSV) -> output  # Parquet data

Import: from kfp import components

I/O Contract

Direction	Name	Type	Required	Description
Input	data	CSV	Yes	Input CSV data
Output	output	Parquet	—	Converted Parquet data

Usage Examples

training_data_parquet = convert_csv_to_apache_parquet_op(
    data=training_data_csv
).output

Related Pages

Principle:Kubeflow_Pipelines_Data_Format_Conversion

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment