Implementation:Huggingface Datasets SqlDatasetReader

Knowledge Sources	Huggingface Datasets HF Datasets Docs
Domains	Data_Engineering, NLP
Last Updated	2026-02-14 18:00 GMT

Overview

Concrete tool for importing data from SQL databases into the HuggingFace Dataset format provided by the HuggingFace Datasets library.

Description

SqlDatasetReader is a reader class that extends AbstractDatasetInputStream and uses the packaged Sql builder to execute a SQL query or select a table and produce a cached Arrow-backed Dataset. The connection can be a SQLAlchemy URI string, a SQLAlchemy engine/connection object, or a sqlite3.Connection. The reader does not support streaming mode. All additional keyword arguments are forwarded to the underlying Sql builder.

Usage

Use SqlDatasetReader when you need to load data directly from a SQL database into a HuggingFace Dataset. It is typically invoked indirectly via Dataset.from_sql(), but can also be instantiated directly.

Code Reference

Source Location

Repository: datasets
File: src/datasets/io/sql.py
Lines: L17-L53

Signature

class SqlDatasetReader(AbstractDatasetInputStream):
    def __init__(
        self,
        sql: Union[str, "sqlalchemy.sql.Selectable"],
        con: Union[str, "sqlalchemy.engine.Connection", "sqlalchemy.engine.Engine", "sqlite3.Connection"],
        features: Optional[Features] = None,
        cache_dir: str = None,
        keep_in_memory: bool = False,
        **kwargs,
    ):

    def read(self):

Import

from datasets.io.sql import SqlDatasetReader

I/O Contract

Inputs

Name	Type	Required	Description
sql	`Union[str, sqlalchemy.sql.Selectable]`	Yes	SQL query string or SQLAlchemy Selectable (e.g., table or select expression) to execute.
con	`Union[str, sqlalchemy.engine.Connection, sqlalchemy.engine.Engine, sqlite3.Connection]`	Yes	Database connection. Can be a URI string, a SQLAlchemy connection/engine, or a sqlite3 connection.
features	`Optional[Features]`	No	Explicit schema to apply instead of inferring from the query results.
cache_dir	`str`	No	Directory for caching the processed dataset.
keep_in_memory	`bool`	No	Whether to keep the dataset in memory instead of memory-mapping. Defaults to False.
**kwargs		No	Additional keyword arguments forwarded to the Sql builder.

Outputs

Name	Type	Description
dataset	`Dataset`	The loaded dataset with the "train" split.

Usage Examples

Basic Usage

from datasets.io.sql import SqlDatasetReader

# Load from SQLite using a URI string
reader = SqlDatasetReader("SELECT * FROM my_table", "sqlite:///my_db.sqlite")
dataset = reader.read()

# Load from SQLite using a connection object
import sqlite3
con = sqlite3.connect("my_db.sqlite")
reader = SqlDatasetReader("SELECT * FROM my_table", con)
dataset = reader.read()

Related Pages

Implements Principle

Principle:Huggingface_Datasets_SQL_Import

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment