Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Huggingface Datasets SQL Import

From Leeroopedia
Knowledge Sources
Domains Data_Engineering, NLP
Last Updated 2026-02-14 18:00 GMT

Overview

SQL Import is the principle of loading data from a SQL database into the HuggingFace Dataset format.

Description

Many organizations store their data in relational databases such as PostgreSQL, MySQL, or SQLite. The SQL Import principle covers executing a SQL query or selecting a table through a database connection, reading the resulting rows, converting them to typed Arrow columns, and producing a cached HuggingFace Dataset. The connection can be provided as a SQLAlchemy connection URI, a SQLAlchemy engine or connection object, or a raw sqlite3.Connection. The underlying Sql builder handles batched fetching and type mapping from database types to Arrow types.

Usage

Use SQL Import when your training or evaluation data resides in a relational database and you want to bring it into the HuggingFace ecosystem without first exporting to an intermediate file format. This is useful for workflows where the authoritative data source is a database and you want to avoid maintaining duplicate copies in CSV or Parquet.

Theoretical Basis

SQL databases store data in a row-oriented format optimized for transactional workloads. Importing SQL query results into a columnar Arrow-backed dataset converts row-oriented tuples into column-oriented arrays, enabling efficient analytical operations. The import pipeline uses pandas read_sql under the hood (via the Sql builder) to fetch rows in batches, then converts each batch into an Arrow table. The resulting tables are concatenated and cached on disk for subsequent zero-copy access.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment