Implementation:Huggingface Datasets Dataset Num Rows
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, ML_Preprocessing |
| Last Updated | 2026-02-14 18:00 GMT |
Overview
Concrete tool for checking the number of rows in a dataset provided by the HuggingFace Datasets library.
Description
The num_rows property returns the total number of rows (examples) in the dataset as an integer. If the dataset has an indices mapping (e.g., after shuffling or selecting), it returns the number of mapped indices. Otherwise, it returns the number of rows in the underlying Arrow table. This property is equivalent to calling len(dataset).
Usage
Use Dataset.num_rows when you need to know the size of a dataset for computing split proportions, setting batch sizes, validating data loading, or reporting dataset statistics.
Code Reference
Source Location
- Repository: datasets
- File:
src/datasets/arrow_dataset.py - Lines: L1879-L1894
Signature
@property
def num_rows(self) -> int:
Import
from datasets import load_dataset
ds = load_dataset("cornell-movie-review-data/rotten_tomatoes", split="validation")
ds.num_rows
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| (none) | N/A | N/A | This is a property with no parameters. |
Outputs
| Name | Type | Description |
|---|---|---|
| return | int |
The number of rows in the dataset. Returns the indices count if an indices mapping exists, otherwise the Arrow table row count. |
Usage Examples
Basic Usage
from datasets import load_dataset
ds = load_dataset("cornell-movie-review-data/rotten_tomatoes", split="validation")
print(ds.num_rows)
# 1066