Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Environment:Huggingface Datasets SQL Dependencies

From Leeroopedia
Revision as of 18:40, 16 February 2026 by Admin (talk | contribs) (Auto-imported from environments/Huggingface_Datasets_SQL_Dependencies.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Database, SQL, Data Import, Data Export
Last Updated 2026-02-14 19:00 GMT

Overview

Description

The SQL Dependencies environment defines the packages required to enable reading from and writing to SQL databases within the HuggingFace Datasets library. SQL support relies on SQLAlchemy as the database abstraction layer, with Python's built-in sqlite3 module available as a lightweight alternative for SQLite databases. SQLAlchemy is not included in the base datasets installation and is currently listed as a test dependency.

Usage

SQL features enable the ability to:

  • Read datasets from SQL databases using SqlDatasetReader in io/sql.py
  • Write datasets to SQL databases using the dataset-to-SQL export functionality
  • Connect to any database supported by SQLAlchemy (PostgreSQL, MySQL, SQLite, Oracle, SQL Server, etc.)

The library checks for the availability of SQLAlchemy at runtime using importlib.util.find_spec("sqlalchemy") and stores the result in the SQLALCHEMY_AVAILABLE flag defined in config.py.

System Requirements

  • Python: Compatible with the Python versions supported by HuggingFace Datasets
  • Operating System: Linux, macOS, or Windows
  • Database: A running database instance accessible via a SQLAlchemy connection string, or a local SQLite file
  • Database Drivers: Depending on the target database, additional driver packages may be needed (e.g., psycopg2 for PostgreSQL, pymysql for MySQL)

Dependencies

Package Minimum Version Purpose Required By
sqlalchemy (see setup.py) Database abstraction, connection management, SQL query execution io/sql.py
sqlite3 (stdlib) Lightweight SQL database support (built into Python) io/sql.py

As defined in setup.py, sqlalchemy is listed in TESTS_REQUIRE, indicating it is used in the test suite and is an optional runtime dependency.

Depending on the target database, additional driver packages may also be required:

Database Driver Package Install Command
PostgreSQL psycopg2 or psycopg2-binary pip install psycopg2-binary
MySQL pymysql or mysqlclient pip install pymysql
SQLite (none, uses stdlib sqlite3) No additional install needed
SQL Server pyodbc pip install pyodbc

Credentials

Database credentials are passed via SQLAlchemy connection strings. These typically include:

  • Username and password for the database
  • Host and port of the database server
  • Database name

Example connection string format:

dialect+driver://username:password@host:port/database

Security note: Connection strings containing credentials should not be committed to version control. Use environment variables or secrets management tools to handle database credentials.

Quick Install

Install SQLAlchemy with pip:

pip install sqlalchemy

For a specific database backend, install the appropriate driver alongside SQLAlchemy:

# PostgreSQL
pip install sqlalchemy psycopg2-binary

# MySQL
pip install sqlalchemy pymysql

# SQLite (no additional driver needed)
pip install sqlalchemy

Code Evidence

Runtime availability check in config.py:

SQLALCHEMY_AVAILABLE = importlib.util.find_spec("sqlalchemy") is not None

TYPE_CHECKING imports in io/sql.py:

from typing import TYPE_CHECKING

if TYPE_CHECKING:
    import sqlalchemy
    import sqlite3

This pattern indicates that sqlalchemy and sqlite3 are used for type annotations and are imported at runtime only when needed, allowing the module to be loaded without these dependencies installed.

Test dependency in setup.py:

TESTS_REQUIRE = [
    ...
    "sqlalchemy",
    ...
]

Common Errors

Error Message Cause Resolution
ModuleNotFoundError: No module named 'sqlalchemy' SQLAlchemy is not installed Run pip install sqlalchemy
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) unable to open database file SQLite database file path is incorrect or inaccessible Verify the database file path and permissions
sqlalchemy.exc.NoSuchModuleError: Can't load plugin: sqlalchemy.dialects:postgresql Database driver for the specified dialect is not installed Install the appropriate driver (e.g., pip install psycopg2-binary)
sqlalchemy.exc.OperationalError: could not connect to server Database server is not running or connection parameters are incorrect Verify the database server is running and the connection string is correct

Compatibility Notes

  • SQLAlchemy is listed in TESTS_REQUIRE rather than as a core or optional extra dependency, which means it is primarily validated through the test suite. Users must install it manually for SQL functionality.
  • The sqlite3 module is part of Python's standard library and does not require separate installation. It is always available in standard CPython distributions.
  • The TYPE_CHECKING import pattern in io/sql.py means that sqlalchemy and sqlite3 are lazily imported, so the SQL module can be loaded even when these packages are not installed.
  • The SQLALCHEMY_AVAILABLE flag in config.py allows the library to check for SQLAlchemy availability before attempting SQL operations, enabling graceful error messages.
  • SQLAlchemy 1.x and 2.x have significant API differences; consult the datasets library documentation for the supported version range.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment