Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Datasets SqlDatasetReader

From Leeroopedia
Revision as of 13:00, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Huggingface_Datasets_SqlDatasetReader.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Data_Engineering, NLP
Last Updated 2026-02-14 18:00 GMT

Overview

Concrete tool for importing data from SQL databases into the HuggingFace Dataset format provided by the HuggingFace Datasets library.

Description

SqlDatasetReader is a reader class that extends AbstractDatasetInputStream and uses the packaged Sql builder to execute a SQL query or select a table and produce a cached Arrow-backed Dataset. The connection can be a SQLAlchemy URI string, a SQLAlchemy engine/connection object, or a sqlite3.Connection. The reader does not support streaming mode. All additional keyword arguments are forwarded to the underlying Sql builder.

Usage

Use SqlDatasetReader when you need to load data directly from a SQL database into a HuggingFace Dataset. It is typically invoked indirectly via Dataset.from_sql(), but can also be instantiated directly.

Code Reference

Source Location

  • Repository: datasets
  • File: src/datasets/io/sql.py
  • Lines: L17-L53

Signature

class SqlDatasetReader(AbstractDatasetInputStream):
    def __init__(
        self,
        sql: Union[str, "sqlalchemy.sql.Selectable"],
        con: Union[str, "sqlalchemy.engine.Connection", "sqlalchemy.engine.Engine", "sqlite3.Connection"],
        features: Optional[Features] = None,
        cache_dir: str = None,
        keep_in_memory: bool = False,
        **kwargs,
    ):

    def read(self):

Import

from datasets.io.sql import SqlDatasetReader

I/O Contract

Inputs

Name Type Required Description
sql Union[str, sqlalchemy.sql.Selectable] Yes SQL query string or SQLAlchemy Selectable (e.g., table or select expression) to execute.
con Union[str, sqlalchemy.engine.Connection, sqlalchemy.engine.Engine, sqlite3.Connection] Yes Database connection. Can be a URI string, a SQLAlchemy connection/engine, or a sqlite3 connection.
features Optional[Features] No Explicit schema to apply instead of inferring from the query results.
cache_dir str No Directory for caching the processed dataset.
keep_in_memory bool No Whether to keep the dataset in memory instead of memory-mapping. Defaults to False.
**kwargs No Additional keyword arguments forwarded to the Sql builder.

Outputs

Name Type Description
dataset Dataset The loaded dataset with the "train" split.

Usage Examples

Basic Usage

from datasets.io.sql import SqlDatasetReader

# Load from SQLite using a URI string
reader = SqlDatasetReader("SELECT * FROM my_table", "sqlite:///my_db.sqlite")
dataset = reader.read()

# Load from SQLite using a connection object
import sqlite3
con = sqlite3.connect("my_db.sqlite")
reader = SqlDatasetReader("SELECT * FROM my_table", con)
dataset = reader.read()

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment