Principle:Huggingface Datasets SQL Import

Knowledge Sources	Huggingface Datasets HF Datasets Docs
Domains	Data_Engineering, NLP
Last Updated	2026-02-14 18:00 GMT

Overview

SQL Import is the principle of loading data from a SQL database into the HuggingFace Dataset format.

Description

Many organizations store their data in relational databases such as PostgreSQL, MySQL, or SQLite. The SQL Import principle covers executing a SQL query or selecting a table through a database connection, reading the resulting rows, converting them to typed Arrow columns, and producing a cached HuggingFace Dataset. The connection can be provided as a SQLAlchemy connection URI, a SQLAlchemy engine or connection object, or a raw sqlite3.Connection. The underlying Sql builder handles batched fetching and type mapping from database types to Arrow types.

Usage

Use SQL Import when your training or evaluation data resides in a relational database and you want to bring it into the HuggingFace ecosystem without first exporting to an intermediate file format. This is useful for workflows where the authoritative data source is a database and you want to avoid maintaining duplicate copies in CSV or Parquet.

Theoretical Basis

SQL databases store data in a row-oriented format optimized for transactional workloads. Importing SQL query results into a columnar Arrow-backed dataset converts row-oriented tuples into column-oriented arrays, enabling efficient analytical operations. The import pipeline uses pandas read_sql under the hood (via the Sql builder) to fetch rows in batches, then converts each batch into an Arrow table. The resulting tables are concatenated and cached on disk for subsequent zero-copy access.

Related Pages

Implemented By

Implementation:Huggingface_Datasets_SqlDatasetReader

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment