Implementation:Huggingface Datasets Dataset To Sql

Knowledge Sources	Huggingface Datasets HF Datasets Docs
Domains	Data_Engineering, NLP
Last Updated	2026-02-14 18:00 GMT

Overview

Concrete tool for exporting a HuggingFace Dataset to a SQL database table provided by the HuggingFace Datasets library.

Description

Dataset.to_sql is a method that writes the dataset's Arrow data to a SQL database table. Internally, it delegates to SqlDatasetWriter, which processes the data in configurable batch sizes, converting each batch to a pandas DataFrame and calling DataFrame.to_sql to insert the rows. The connection can be a SQLAlchemy URI string, a SQLAlchemy engine/connection object, or a sqlite3.Connection. The method supports multiprocessing and all pandas to_sql keyword arguments (e.g., if_exists, dtype, method). By default, the index is omitted.

Usage

Use Dataset.to_sql when you need to write a processed dataset to a relational database for serving, reporting, or integration with application backends.

Code Reference

Source Location

Repository: datasets
File: src/datasets/arrow_dataset.py
Lines: L5298-L5345

Signature

def to_sql(
    self,
    name: str,
    con: Union[str, "sqlalchemy.engine.Connection", "sqlalchemy.engine.Engine", "sqlite3.Connection"],
    batch_size: Optional[int] = None,
    **sql_writer_kwargs,
) -> int:

Import

from datasets import Dataset
# to_sql is a method on Dataset instances

I/O Contract

Inputs

Name	Type	Required	Description
name	`str`	Yes	Name of the SQL table to write to.
con	`Union[str, sqlalchemy.engine.Connection, sqlalchemy.engine.Engine, sqlite3.Connection]`	Yes	Database connection. Can be a URI string, a SQLAlchemy connection/engine, or a sqlite3 connection.
batch_size	`Optional[int]`	No	Number of rows per batch. Defaults to datasets.config.DEFAULT_MAX_BATCH_SIZE.
**sql_writer_kwargs		No	Additional keyword arguments forwarded to pandas.DataFrame.to_sql (e.g., if_exists, dtype, method, index).

Outputs

Name	Type	Description
written	`int`	The number of records written.

Usage Examples

Basic Usage

from datasets import load_dataset

ds = load_dataset("csv", data_files="input.csv", split="train")

# Export to SQLite using a URI string
num_rows = ds.to_sql("data", "sqlite:///my_db.sqlite")

# Export to SQLite using a connection object
import sqlite3
con = sqlite3.connect("my_db.sqlite")
with con:
    num_rows = ds.to_sql("data", con)

# Export with if_exists="replace"
num_rows = ds.to_sql("data", "sqlite:///my_db.sqlite", if_exists="replace")

Related Pages

Implements Principle

Principle:Huggingface_Datasets_SQL_Export

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment