Implementation:Huggingface Datasets Dataset To Sql
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, NLP |
| Last Updated | 2026-02-14 18:00 GMT |
Overview
Concrete tool for exporting a HuggingFace Dataset to a SQL database table provided by the HuggingFace Datasets library.
Description
Dataset.to_sql is a method that writes the dataset's Arrow data to a SQL database table. Internally, it delegates to SqlDatasetWriter, which processes the data in configurable batch sizes, converting each batch to a pandas DataFrame and calling DataFrame.to_sql to insert the rows. The connection can be a SQLAlchemy URI string, a SQLAlchemy engine/connection object, or a sqlite3.Connection. The method supports multiprocessing and all pandas to_sql keyword arguments (e.g., if_exists, dtype, method). By default, the index is omitted.
Usage
Use Dataset.to_sql when you need to write a processed dataset to a relational database for serving, reporting, or integration with application backends.
Code Reference
Source Location
- Repository: datasets
- File:
src/datasets/arrow_dataset.py - Lines: L5298-L5345
Signature
def to_sql(
self,
name: str,
con: Union[str, "sqlalchemy.engine.Connection", "sqlalchemy.engine.Engine", "sqlite3.Connection"],
batch_size: Optional[int] = None,
**sql_writer_kwargs,
) -> int:
Import
from datasets import Dataset
# to_sql is a method on Dataset instances
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| name | str |
Yes | Name of the SQL table to write to. |
| con | Union[str, sqlalchemy.engine.Connection, sqlalchemy.engine.Engine, sqlite3.Connection] |
Yes | Database connection. Can be a URI string, a SQLAlchemy connection/engine, or a sqlite3 connection. |
| batch_size | Optional[int] |
No | Number of rows per batch. Defaults to datasets.config.DEFAULT_MAX_BATCH_SIZE. |
| **sql_writer_kwargs | No | Additional keyword arguments forwarded to pandas.DataFrame.to_sql (e.g., if_exists, dtype, method, index). |
Outputs
| Name | Type | Description |
|---|---|---|
| written | int |
The number of records written. |
Usage Examples
Basic Usage
from datasets import load_dataset
ds = load_dataset("csv", data_files="input.csv", split="train")
# Export to SQLite using a URI string
num_rows = ds.to_sql("data", "sqlite:///my_db.sqlite")
# Export to SQLite using a connection object
import sqlite3
con = sqlite3.connect("my_db.sqlite")
with con:
num_rows = ds.to_sql("data", con)
# Export with if_exists="replace"
num_rows = ds.to_sql("data", "sqlite:///my_db.sqlite", if_exists="replace")