Implementation:Huggingface Datasets Dataset Column Names
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, ML_Preprocessing |
| Last Updated | 2026-02-14 18:00 GMT |
Overview
Concrete tool for retrieving the list of column names from a dataset provided by the HuggingFace Datasets library.
Description
The column_names property returns the names of all columns in the dataset as a list of strings. It reads the column names directly from the underlying Apache Arrow table, making it an O(1) operation. This property is typically the first thing accessed when exploring a new dataset to understand its schema.
Usage
Use Dataset.column_names when you need to programmatically inspect which columns are present in a dataset before performing column-level operations such as renaming, removing, or selecting columns for formatting.
Code Reference
Source Location
- Repository: datasets
- File:
src/datasets/arrow_dataset.py - Lines: L1896-L1909
Signature
@property
def column_names(self) -> list[str]:
Import
from datasets import load_dataset
ds = load_dataset("cornell-movie-review-data/rotten_tomatoes", split="validation")
ds.column_names
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| (none) | N/A | N/A | This is a property with no parameters. |
Outputs
| Name | Type | Description |
|---|---|---|
| return | list[str] |
List of column name strings from the underlying Arrow table. |
Usage Examples
Basic Usage
from datasets import load_dataset
ds = load_dataset("cornell-movie-review-data/rotten_tomatoes", split="validation")
print(ds.column_names)
# ['text', 'label']