Implementation:Huggingface Datasets Dataset Remove Columns
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, ML_Preprocessing |
| Last Updated | 2026-02-14 18:00 GMT |
Overview
Concrete tool for removing one or more columns from a dataset provided by the HuggingFace Datasets library.
Description
The remove_columns method creates a copy of the dataset with the specified columns removed. Unlike using map with remove_columns, this method does not copy the data of the remaining columns, making it significantly faster. It accepts either a single column name string or a list of column names. The method validates that all specified columns exist in the dataset before removal. Removing all columns results in an empty dataset with num_rows set to 0.
Usage
Use Dataset.remove_columns when you need to drop columns that are not needed for the current task, such as metadata columns, raw text after tokenization, or auxiliary annotation fields.
Code Reference
Source Location
- Repository: datasets
- File:
src/datasets/arrow_dataset.py - Lines: L2208-L2261
Signature
@transmit_format
@fingerprint_transform(inplace=False)
def remove_columns(
self,
column_names: Union[str, list[str]],
new_fingerprint: Optional[str] = None,
) -> "Dataset":
Import
from datasets import load_dataset
ds = load_dataset("cornell-movie-review-data/rotten_tomatoes", split="validation")
ds = ds.remove_columns("label")
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| column_names | Union[str, list[str]] |
Yes | Name(s) of the column(s) to remove. All specified columns must exist in the dataset. |
| new_fingerprint | Optional[str] |
No | The new fingerprint of the dataset after transform. If None, computed automatically.
|
Outputs
| Name | Type | Description |
|---|---|---|
| return | Dataset |
A copy of the dataset without the removed columns. |
Usage Examples
Basic Usage
from datasets import load_dataset
ds = load_dataset("cornell-movie-review-data/rotten_tomatoes", split="validation")
# Remove a single column
ds_no_label = ds.remove_columns("label")
print(ds_no_label.column_names)
# ['text']
# Remove multiple columns
ds_empty = ds.remove_columns(ds.column_names)
print(ds_empty.num_rows)
# 0