Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Datasets Dataset Remove Columns

From Leeroopedia
Revision as of 12:58, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Huggingface_Datasets_Dataset_Remove_Columns.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Data_Engineering, ML_Preprocessing
Last Updated 2026-02-14 18:00 GMT

Overview

Concrete tool for removing one or more columns from a dataset provided by the HuggingFace Datasets library.

Description

The remove_columns method creates a copy of the dataset with the specified columns removed. Unlike using map with remove_columns, this method does not copy the data of the remaining columns, making it significantly faster. It accepts either a single column name string or a list of column names. The method validates that all specified columns exist in the dataset before removal. Removing all columns results in an empty dataset with num_rows set to 0.

Usage

Use Dataset.remove_columns when you need to drop columns that are not needed for the current task, such as metadata columns, raw text after tokenization, or auxiliary annotation fields.

Code Reference

Source Location

  • Repository: datasets
  • File: src/datasets/arrow_dataset.py
  • Lines: L2208-L2261

Signature

@transmit_format
@fingerprint_transform(inplace=False)
def remove_columns(
    self,
    column_names: Union[str, list[str]],
    new_fingerprint: Optional[str] = None,
) -> "Dataset":

Import

from datasets import load_dataset

ds = load_dataset("cornell-movie-review-data/rotten_tomatoes", split="validation")
ds = ds.remove_columns("label")

I/O Contract

Inputs

Name Type Required Description
column_names Union[str, list[str]] Yes Name(s) of the column(s) to remove. All specified columns must exist in the dataset.
new_fingerprint Optional[str] No The new fingerprint of the dataset after transform. If None, computed automatically.

Outputs

Name Type Description
return Dataset A copy of the dataset without the removed columns.

Usage Examples

Basic Usage

from datasets import load_dataset

ds = load_dataset("cornell-movie-review-data/rotten_tomatoes", split="validation")

# Remove a single column
ds_no_label = ds.remove_columns("label")
print(ds_no_label.column_names)
# ['text']

# Remove multiple columns
ds_empty = ds.remove_columns(ds.column_names)
print(ds_empty.num_rows)
# 0

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment