Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Kubeflow Pipelines Chicago Taxi Dataset Pipeline

From Leeroopedia

Sources: Kubeflow Pipelines

Domains: Data_Engineering, ETL

Last Updated: 2026-02-13

Overview

Wrapper Doc for a chain of reusable KFP components that load and prepare Chicago Taxi training data.

Description

Three components chained together:

  • chicago_taxi_dataset_op — loads data with SQL filtering
  • pandas_transform_csv_op — extracts label column via pandas
  • drop_header_op — removes CSV header for metric evaluation

All loaded via components.load_component_from_url().

Code Reference

Source: samples/core/train_until_good/train_until_good.py (L22-27 loading, L71-83 invocation)

Import: from kfp import components

Signature

# chicago_taxi_dataset_op
chicago_taxi_dataset_op(where: str, select: str, limit: int) -> output

# pandas_transform_csv_op
pandas_transform_csv_op(table: CSV, transform_code: str) -> output

# drop_header_op
drop_header_op(table: CSV) -> output

I/O Contract

Inputs
Name Type Description
where str SQL filter clause
select str Columns to select
limit int Row limit
transform_code str Pandas expression for transformation
Outputs
Name Type Description
training_data CSV Prepared training data
true_values headerless CSV Ground truth labels with header removed

Usage Examples

training_data = chicago_taxi_dataset_op(
    where='trip_start_timestamp >= "2019-01-01" AND trip_start_timestamp < "2019-02-01"',
    select='tips,trip_seconds,trip_miles,pickup_community_area,dropoff_community_area,fare,tolls,extras,trip_total',
    limit=10000,
).output

true_values_table = pandas_transform_csv_op(
    table=training_data,
    transform_code='df = df[["tips"]]',
).output

true_values = drop_header_op(true_values_table).output

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment