Implementation:Huggingface Datasets SqlDatasetReader
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, NLP |
| Last Updated | 2026-02-14 18:00 GMT |
Overview
Concrete tool for importing data from SQL databases into the HuggingFace Dataset format provided by the HuggingFace Datasets library.
Description
SqlDatasetReader is a reader class that extends AbstractDatasetInputStream and uses the packaged Sql builder to execute a SQL query or select a table and produce a cached Arrow-backed Dataset. The connection can be a SQLAlchemy URI string, a SQLAlchemy engine/connection object, or a sqlite3.Connection. The reader does not support streaming mode. All additional keyword arguments are forwarded to the underlying Sql builder.
Usage
Use SqlDatasetReader when you need to load data directly from a SQL database into a HuggingFace Dataset. It is typically invoked indirectly via Dataset.from_sql(), but can also be instantiated directly.
Code Reference
Source Location
- Repository: datasets
- File:
src/datasets/io/sql.py - Lines: L17-L53
Signature
class SqlDatasetReader(AbstractDatasetInputStream):
def __init__(
self,
sql: Union[str, "sqlalchemy.sql.Selectable"],
con: Union[str, "sqlalchemy.engine.Connection", "sqlalchemy.engine.Engine", "sqlite3.Connection"],
features: Optional[Features] = None,
cache_dir: str = None,
keep_in_memory: bool = False,
**kwargs,
):
def read(self):
Import
from datasets.io.sql import SqlDatasetReader
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| sql | Union[str, sqlalchemy.sql.Selectable] |
Yes | SQL query string or SQLAlchemy Selectable (e.g., table or select expression) to execute. |
| con | Union[str, sqlalchemy.engine.Connection, sqlalchemy.engine.Engine, sqlite3.Connection] |
Yes | Database connection. Can be a URI string, a SQLAlchemy connection/engine, or a sqlite3 connection. |
| features | Optional[Features] |
No | Explicit schema to apply instead of inferring from the query results. |
| cache_dir | str |
No | Directory for caching the processed dataset. |
| keep_in_memory | bool |
No | Whether to keep the dataset in memory instead of memory-mapping. Defaults to False. |
| **kwargs | No | Additional keyword arguments forwarded to the Sql builder. |
Outputs
| Name | Type | Description |
|---|---|---|
| dataset | Dataset |
The loaded dataset with the "train" split. |
Usage Examples
Basic Usage
from datasets.io.sql import SqlDatasetReader
# Load from SQLite using a URI string
reader = SqlDatasetReader("SELECT * FROM my_table", "sqlite:///my_db.sqlite")
dataset = reader.read()
# Load from SQLite using a connection object
import sqlite3
con = sqlite3.connect("my_db.sqlite")
reader = SqlDatasetReader("SELECT * FROM my_table", con)
dataset = reader.read()