Principle:Huggingface Datasets Dataset From List Construction
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, NLP |
| Last Updated | 2026-02-14 18:00 GMT |
Overview
Creating datasets from a list of dictionaries provides a row-oriented construction path that mirrors how data is commonly structured in Python.
Description
List-based dataset construction accepts a Python list where each element is a dictionary representing a single row (example). The keys of the first dictionary determine the column names. Internally, the list of row-dictionaries is transposed into a column-oriented dictionary and then delegated to the dictionary-based construction method. This makes it a convenient shortcut for data that naturally arrives in row-oriented format, such as JSON records, API responses, or query results.
Usage
Use list-based construction when your data is naturally organized as a list of records (dictionaries). This is the most intuitive format for many data sources, and the method handles the columnar transposition automatically.
Theoretical Basis
The method performs a simple transpose operation: it extracts column names from the first row's keys, then builds a columnar dictionary by collecting each key's values across all rows using dict.get. This transposed dictionary is then passed to from_dict, reusing all its type inference and casting logic. The design follows the principle of layered construction, where higher-level convenience methods delegate to lower-level ones.