Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Datasets Dataset To Sql

From Leeroopedia
Knowledge Sources
Domains Data_Engineering, NLP
Last Updated 2026-02-14 18:00 GMT

Overview

Concrete tool for exporting a HuggingFace Dataset to a SQL database table provided by the HuggingFace Datasets library.

Description

Dataset.to_sql is a method that writes the dataset's Arrow data to a SQL database table. Internally, it delegates to SqlDatasetWriter, which processes the data in configurable batch sizes, converting each batch to a pandas DataFrame and calling DataFrame.to_sql to insert the rows. The connection can be a SQLAlchemy URI string, a SQLAlchemy engine/connection object, or a sqlite3.Connection. The method supports multiprocessing and all pandas to_sql keyword arguments (e.g., if_exists, dtype, method). By default, the index is omitted.

Usage

Use Dataset.to_sql when you need to write a processed dataset to a relational database for serving, reporting, or integration with application backends.

Code Reference

Source Location

  • Repository: datasets
  • File: src/datasets/arrow_dataset.py
  • Lines: L5298-L5345

Signature

def to_sql(
    self,
    name: str,
    con: Union[str, "sqlalchemy.engine.Connection", "sqlalchemy.engine.Engine", "sqlite3.Connection"],
    batch_size: Optional[int] = None,
    **sql_writer_kwargs,
) -> int:

Import

from datasets import Dataset
# to_sql is a method on Dataset instances

I/O Contract

Inputs

Name Type Required Description
name str Yes Name of the SQL table to write to.
con Union[str, sqlalchemy.engine.Connection, sqlalchemy.engine.Engine, sqlite3.Connection] Yes Database connection. Can be a URI string, a SQLAlchemy connection/engine, or a sqlite3 connection.
batch_size Optional[int] No Number of rows per batch. Defaults to datasets.config.DEFAULT_MAX_BATCH_SIZE.
**sql_writer_kwargs No Additional keyword arguments forwarded to pandas.DataFrame.to_sql (e.g., if_exists, dtype, method, index).

Outputs

Name Type Description
written int The number of records written.

Usage Examples

Basic Usage

from datasets import load_dataset

ds = load_dataset("csv", data_files="input.csv", split="train")

# Export to SQLite using a URI string
num_rows = ds.to_sql("data", "sqlite:///my_db.sqlite")

# Export to SQLite using a connection object
import sqlite3
con = sqlite3.connect("my_db.sqlite")
with con:
    num_rows = ds.to_sql("data", con)

# Export with if_exists="replace"
num_rows = ds.to_sql("data", "sqlite:///my_db.sqlite", if_exists="replace")

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment