Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Datasets Dataset Column Names

From Leeroopedia
Knowledge Sources
Domains Data_Engineering, ML_Preprocessing
Last Updated 2026-02-14 18:00 GMT

Overview

Concrete tool for retrieving the list of column names from a dataset provided by the HuggingFace Datasets library.

Description

The column_names property returns the names of all columns in the dataset as a list of strings. It reads the column names directly from the underlying Apache Arrow table, making it an O(1) operation. This property is typically the first thing accessed when exploring a new dataset to understand its schema.

Usage

Use Dataset.column_names when you need to programmatically inspect which columns are present in a dataset before performing column-level operations such as renaming, removing, or selecting columns for formatting.

Code Reference

Source Location

  • Repository: datasets
  • File: src/datasets/arrow_dataset.py
  • Lines: L1896-L1909

Signature

@property
def column_names(self) -> list[str]:

Import

from datasets import load_dataset

ds = load_dataset("cornell-movie-review-data/rotten_tomatoes", split="validation")
ds.column_names

I/O Contract

Inputs

Name Type Required Description
(none) N/A N/A This is a property with no parameters.

Outputs

Name Type Description
return list[str] List of column name strings from the underlying Arrow table.

Usage Examples

Basic Usage

from datasets import load_dataset

ds = load_dataset("cornell-movie-review-data/rotten_tomatoes", split="validation")
print(ds.column_names)
# ['text', 'label']

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment