Implementation:Huggingface Datasets Dataset Column Names

Knowledge Sources	Huggingface Datasets HF Datasets Docs
Domains	Data_Engineering, ML_Preprocessing
Last Updated	2026-02-14 18:00 GMT

Overview

Concrete tool for retrieving the list of column names from a dataset provided by the HuggingFace Datasets library.

Description

The column_names property returns the names of all columns in the dataset as a list of strings. It reads the column names directly from the underlying Apache Arrow table, making it an O(1) operation. This property is typically the first thing accessed when exploring a new dataset to understand its schema.

Usage

Use Dataset.column_names when you need to programmatically inspect which columns are present in a dataset before performing column-level operations such as renaming, removing, or selecting columns for formatting.

Code Reference

Source Location

Repository: datasets
File: src/datasets/arrow_dataset.py
Lines: L1896-L1909

Signature

@property
def column_names(self) -> list[str]:

Import

from datasets import load_dataset

ds = load_dataset("cornell-movie-review-data/rotten_tomatoes", split="validation")
ds.column_names

I/O Contract

Inputs

Name	Type	Required	Description
(none)	N/A	N/A	This is a property with no parameters.

Outputs

Name	Type	Description
return	`list[str]`	List of column name strings from the underlying Arrow table.

Usage Examples

Basic Usage

from datasets import load_dataset

ds = load_dataset("cornell-movie-review-data/rotten_tomatoes", split="validation")
print(ds.column_names)
# ['text', 'label']

Related Pages

Implements Principle

Principle:Huggingface_Datasets_Column_Name_Inspection

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment