Implementation:Haifengl Smile SQL Metadata API

Overview

The SQL Metadata API provides three methods on smile.data.SQL for inspecting the database catalog: tables() lists all tables, describe() shows the columns of a specific table, and extensions() reports installed DuckDB extensions. Additionally, installExtension(), loadExtension(), and updateExtensions() manage the extension lifecycle. All inspection methods return DataFrame objects for easy display and further analysis.

API Signature

public class SQL implements AutoCloseable {
    /**
     * Returns the tables in the database.
     * @return a DataFrame with TABLE_NAME and REMARKS columns.
     * @throws SQLException if fail to query metadata.
     */
    public DataFrame tables() throws SQLException

    /**
     * Returns the columns of a specific table.
     * @param table the table name.
     * @return a DataFrame with COLUMN_NAME, TYPE_NAME, and IS_NULLABLE columns.
     * @throws SQLException if fail to query metadata.
     */
    public DataFrame describe(String table) throws SQLException

    /**
     * Returns the installed DuckDB extensions.
     * @return a DataFrame with name, loaded, description, version, and mode columns.
     * @throws SQLException if fail to execute the query.
     */
    public DataFrame extensions() throws SQLException

    /**
     * Updates all installed extensions to their latest versions.
     * @return this object (for chaining).
     * @throws SQLException if fail to update extensions.
     */
    public SQL updateExtensions() throws SQLException

    /**
     * Installs a DuckDB extension.
     * @param name the extension name.
     * @return this object (for chaining).
     * @throws SQLException if fail to install the extension.
     */
    public SQL installExtension(String name) throws SQLException

    /**
     * Loads an installed DuckDB extension into the current session.
     * @param name the extension name.
     * @return this object (for chaining).
     * @throws SQLException if fail to load the extension.
     */
    public SQL loadExtension(String name) throws SQLException
}

Source

base/src/main/java/smile/data/SQL.java:L77-144

Extension management methods are at lines 90-117.

Import

import smile.data.SQL;
import smile.data.DataFrame;

Type

API Doc

Inputs

Method	Parameter	Type	Required	Description
`tables()`	(none)	--	--	Lists all tables in the current database.
`describe()`	`table`	`String`	Yes	The name of the table to describe.
`extensions()`	(none)	--	--	Lists all installed DuckDB extensions.
`installExtension()`	`name`	`String`	Yes	The extension name to install.
`loadExtension()`	`name`	`String`	Yes	The extension name to load.
`updateExtensions()`	(none)	--	--	Updates all installed extensions.

Outputs

tables()

Returns a DataFrame with the following columns:

Column	Type	Description
`TABLE_NAME`	`String`	The name of each table in the database.
`REMARKS`	`String`	Optional table comments/description.

describe(table)

Returns a DataFrame with the following columns:

Column	Type	Description
`COLUMN_NAME`	`String`	The name of each column in the table.
`TYPE_NAME`	`String`	The DuckDB data type (e.g., `INTEGER`, `VARCHAR`, `DOUBLE`).
`IS_NULLABLE`	`String`	Whether the column allows NULL values (`"YES"` or `"NO"`).

extensions()

Returns a DataFrame with the following columns:

Column	Type	Description
`name`	`String`	Extension name (e.g., `parquet`, `httpfs`, `iceberg`).
`loaded`	`Boolean`	Whether the extension is currently loaded in the session.
`description`	`String`	Human-readable description of the extension's purpose.
`version`	`String`	Installed version string.
`mode`	`String`	Installation mode (e.g., `AUTOLOADED`, `CUSTOM`).

Internal Mechanism

tables() and describe()

These methods use the standard JDBC DatabaseMetaData interface:

// tables()
DatabaseMetaData meta = db.getMetaData();
ResultSet rs = meta.getTables(null, null, null, null);
DataFrame df = DataFrame.of(rs);
return df.select("TABLE_NAME", "REMARKS");

// describe(table)
DatabaseMetaData meta = db.getMetaData();
ResultSet rs = meta.getColumns(null, null, table, null);
DataFrame df = DataFrame.of(rs);
return df.select("COLUMN_NAME", "TYPE_NAME", "IS_NULLABLE");

The full JDBC metadata ResultSet contains many columns (TABLE_CAT, TABLE_SCHEM, etc.), but the methods select only the most relevant columns for a clean, focused output.

extensions()

This method executes a DuckDB-specific SQL query against the duckdb_extensions() table function:

return query("""
    SELECT extension_name AS name, loaded, description,
           extension_version AS version, install_mode AS mode
    FROM duckdb_extensions() WHERE installed = true;""");

Extension Management

The extension management methods execute DDL statements directly:

installExtension(name): Executes INSTALL {name};
loadExtension(name): Executes LOAD {name};
updateExtensions(): Executes UPDATE EXTENSIONS;

All three return this for method chaining.

Usage Examples

Listing All Tables

import smile.data.SQL;
import smile.data.DataFrame;

try (var db = new SQL()) {
    db.csv("customers", "data/customers.csv")
      .parquet("orders", "data/orders.parquet")
      .csv("products", "data/products.csv");

    // List all loaded tables
    DataFrame tables = db.tables();
    System.out.println(tables);
    // Output:
    // +----------+--------+
    // |TABLE_NAME|REMARKS |
    // +----------+--------+
    // | customers|        |
    // |    orders|        |
    // |  products|        |
    // +----------+--------+
}

Describing Table Schema

import smile.data.SQL;
import smile.data.DataFrame;

try (var db = new SQL()) {
    db.csv("iris", "data/iris.csv");

    // Inspect the table schema
    DataFrame schema = db.describe("iris");
    System.out.println(schema);
    // Output:
    // +-----------+---------+-----------+
    // |COLUMN_NAME|TYPE_NAME|IS_NULLABLE|
    // +-----------+---------+-----------+
    // |sepal_length|  DOUBLE|        YES|
    // |sepal_width |  DOUBLE|        YES|
    // |petal_length|  DOUBLE|        YES|
    // |petal_width |  DOUBLE|        YES|
    // |species     | VARCHAR|        YES|
    // +-----------+---------+-----------+
}

Checking and Installing Extensions

import smile.data.SQL;
import smile.data.DataFrame;

try (var db = new SQL()) {
    // View currently installed extensions
    DataFrame ext = db.extensions();
    System.out.println(ext);

    // Install and load the Iceberg extension
    db.installExtension("iceberg")
      .loadExtension("iceberg");

    // Verify the extension is loaded
    ext = db.extensions();
    System.out.println(ext);
}

Schema Validation in an ML Pipeline

import smile.data.SQL;
import smile.data.DataFrame;

try (var db = new SQL()) {
    db.parquet("training_data", "data/features.parquet");

    // Validate schema before proceeding to ML
    DataFrame schema = db.describe("training_data");
    System.out.println("Schema of training data:");
    System.out.println(schema);

    // Programmatically check expected columns exist
    String[] expectedColumns = {"age", "income", "credit_score", "label"};
    DataFrame tables = db.tables();
    // Use the schema DataFrame for validation logic...

    // Proceed with queries only after validation
    DataFrame features = db.query("""
        SELECT age, income, credit_score, label
        FROM training_data
        WHERE label IS NOT NULL
    """);
    double[][] X = features.select("age", "income", "credit_score").toArray();
}

Principle:Haifengl_Smile_SQL_Metadata_Exploration - The theoretical basis for metadata inspection.
Implementation:Haifengl_Smile_SQL_Constructor - Metadata methods require an active SQL connection.
Implementation:Haifengl_Smile_SQL_Data_Loaders - Metadata exploration is typically performed after loading data.
Implementation:Haifengl_Smile_SQL_Query_API - Schema knowledge informs query construction.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment

Overview

API Signature

Source

Import

Type

Inputs

Outputs

tables()

describe(table)

extensions()

Internal Mechanism

tables() and describe()

extensions()

Extension Management

Usage Examples

Listing All Tables

Describing Table Schema

Checking and Installing Extensions

Schema Validation in an ML Pipeline

Related

Page Connections