Implementation:Online ml River Datasets Taxis
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Datasets, Regression |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Concrete dataset for regression provided by the River library.
Description
Taxi ride durations in New York City. The goal is to predict the duration of taxi rides in New York City based on pickup/dropoff locations, timestamps, and other features.
This dataset contains 1,458,644 samples with 8 features for regression tasks.
Usage
This dataset is useful for:
- Time series regression and duration prediction
- Geospatial feature engineering
- Transportation and urban mobility analysis
- Large-scale regression problems
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/datasets/taxis.py
Signature
class Taxis(base.RemoteDataset):
def __init__(self):
super().__init__(
n_samples=1_458_644,
n_features=8,
task=base.REG,
url="https://maxhalford.github.io/files/datasets/nyc_taxis.zip",
size=195_271_696,
filename="train.csv",
)
def _iter(self):
return stream.iter_csv(
self.path,
target="trip_duration",
converters={
"passenger_count": int,
"pickup_longitude": float,
"pickup_latitude": float,
"dropoff_longitude": float,
"dropoff_latitude": float,
"trip_duration": int,
},
parse_dates={"pickup_datetime": "%Y-%m-%d %H:%M:%S"},
drop=["dropoff_datetime", "id"],
)
Import
from river import datasets
dataset = datasets.Taxis()
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| (none) | — | — | No parameters needed |
Outputs
| Name | Type | Description |
|---|---|---|
| iter() | tuple(dict, int) | Yields (features_dict, target) pairs where target is trip duration in seconds |
Dataset Properties
| Property | Value |
|---|---|
| Number of samples | 1,458,644 |
| Number of features | 8 |
| Task | Regression |
| Format | CSV (compressed) |
| Size | 195,271,696 bytes (~186 MB) |
Features
The dataset includes the following features:
- passenger_count: Number of passengers (integer)
- pickup_longitude: Longitude of pickup location (float)
- pickup_latitude: Latitude of pickup location (float)
- dropoff_longitude: Longitude of dropoff location (float)
- dropoff_latitude: Latitude of dropoff location (float)
- pickup_datetime: Timestamp of pickup (datetime)
- Additional metadata about the trip
- trip_duration: Duration of the trip in seconds (target variable, integer)
Usage Examples
from river import datasets
dataset = datasets.Taxis()
for x, y in dataset:
print(x, y)
break
References
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment