Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Haifengl Smile SQL Metadata API

From Leeroopedia


Overview

The SQL Metadata API provides three methods on smile.data.SQL for inspecting the database catalog: tables() lists all tables, describe() shows the columns of a specific table, and extensions() reports installed DuckDB extensions. Additionally, installExtension(), loadExtension(), and updateExtensions() manage the extension lifecycle. All inspection methods return DataFrame objects for easy display and further analysis.

API Signature

public class SQL implements AutoCloseable {
    /**
     * Returns the tables in the database.
     * @return a DataFrame with TABLE_NAME and REMARKS columns.
     * @throws SQLException if fail to query metadata.
     */
    public DataFrame tables() throws SQLException

    /**
     * Returns the columns of a specific table.
     * @param table the table name.
     * @return a DataFrame with COLUMN_NAME, TYPE_NAME, and IS_NULLABLE columns.
     * @throws SQLException if fail to query metadata.
     */
    public DataFrame describe(String table) throws SQLException

    /**
     * Returns the installed DuckDB extensions.
     * @return a DataFrame with name, loaded, description, version, and mode columns.
     * @throws SQLException if fail to execute the query.
     */
    public DataFrame extensions() throws SQLException

    /**
     * Updates all installed extensions to their latest versions.
     * @return this object (for chaining).
     * @throws SQLException if fail to update extensions.
     */
    public SQL updateExtensions() throws SQLException

    /**
     * Installs a DuckDB extension.
     * @param name the extension name.
     * @return this object (for chaining).
     * @throws SQLException if fail to install the extension.
     */
    public SQL installExtension(String name) throws SQLException

    /**
     * Loads an installed DuckDB extension into the current session.
     * @param name the extension name.
     * @return this object (for chaining).
     * @throws SQLException if fail to load the extension.
     */
    public SQL loadExtension(String name) throws SQLException
}

Source

base/src/main/java/smile/data/SQL.java:L77-144

Extension management methods are at lines 90-117.

Import

import smile.data.SQL;
import smile.data.DataFrame;

Type

API Doc

Inputs

Method Parameter Type Required Description
tables() (none) -- -- Lists all tables in the current database.
describe() table String Yes The name of the table to describe.
extensions() (none) -- -- Lists all installed DuckDB extensions.
installExtension() name String Yes The extension name to install.
loadExtension() name String Yes The extension name to load.
updateExtensions() (none) -- -- Updates all installed extensions.

Outputs

tables()

Returns a DataFrame with the following columns:

Column Type Description
TABLE_NAME String The name of each table in the database.
REMARKS String Optional table comments/description.

describe(table)

Returns a DataFrame with the following columns:

Column Type Description
COLUMN_NAME String The name of each column in the table.
TYPE_NAME String The DuckDB data type (e.g., INTEGER, VARCHAR, DOUBLE).
IS_NULLABLE String Whether the column allows NULL values ("YES" or "NO").

extensions()

Returns a DataFrame with the following columns:

Column Type Description
name String Extension name (e.g., parquet, httpfs, iceberg).
loaded Boolean Whether the extension is currently loaded in the session.
description String Human-readable description of the extension's purpose.
version String Installed version string.
mode String Installation mode (e.g., AUTOLOADED, CUSTOM).

Internal Mechanism

tables() and describe()

These methods use the standard JDBC DatabaseMetaData interface:

// tables()
DatabaseMetaData meta = db.getMetaData();
ResultSet rs = meta.getTables(null, null, null, null);
DataFrame df = DataFrame.of(rs);
return df.select("TABLE_NAME", "REMARKS");

// describe(table)
DatabaseMetaData meta = db.getMetaData();
ResultSet rs = meta.getColumns(null, null, table, null);
DataFrame df = DataFrame.of(rs);
return df.select("COLUMN_NAME", "TYPE_NAME", "IS_NULLABLE");

The full JDBC metadata ResultSet contains many columns (TABLE_CAT, TABLE_SCHEM, etc.), but the methods select only the most relevant columns for a clean, focused output.

extensions()

This method executes a DuckDB-specific SQL query against the duckdb_extensions() table function:

return query("""
    SELECT extension_name AS name, loaded, description,
           extension_version AS version, install_mode AS mode
    FROM duckdb_extensions() WHERE installed = true;""");

Extension Management

The extension management methods execute DDL statements directly:

  • installExtension(name): Executes INSTALL {name};
  • loadExtension(name): Executes LOAD {name};
  • updateExtensions(): Executes UPDATE EXTENSIONS;

All three return this for method chaining.

Usage Examples

Listing All Tables

import smile.data.SQL;
import smile.data.DataFrame;

try (var db = new SQL()) {
    db.csv("customers", "data/customers.csv")
      .parquet("orders", "data/orders.parquet")
      .csv("products", "data/products.csv");

    // List all loaded tables
    DataFrame tables = db.tables();
    System.out.println(tables);
    // Output:
    // +----------+--------+
    // |TABLE_NAME|REMARKS |
    // +----------+--------+
    // | customers|        |
    // |    orders|        |
    // |  products|        |
    // +----------+--------+
}

Describing Table Schema

import smile.data.SQL;
import smile.data.DataFrame;

try (var db = new SQL()) {
    db.csv("iris", "data/iris.csv");

    // Inspect the table schema
    DataFrame schema = db.describe("iris");
    System.out.println(schema);
    // Output:
    // +-----------+---------+-----------+
    // |COLUMN_NAME|TYPE_NAME|IS_NULLABLE|
    // +-----------+---------+-----------+
    // |sepal_length|  DOUBLE|        YES|
    // |sepal_width |  DOUBLE|        YES|
    // |petal_length|  DOUBLE|        YES|
    // |petal_width |  DOUBLE|        YES|
    // |species     | VARCHAR|        YES|
    // +-----------+---------+-----------+
}

Checking and Installing Extensions

import smile.data.SQL;
import smile.data.DataFrame;

try (var db = new SQL()) {
    // View currently installed extensions
    DataFrame ext = db.extensions();
    System.out.println(ext);

    // Install and load the Iceberg extension
    db.installExtension("iceberg")
      .loadExtension("iceberg");

    // Verify the extension is loaded
    ext = db.extensions();
    System.out.println(ext);
}

Schema Validation in an ML Pipeline

import smile.data.SQL;
import smile.data.DataFrame;

try (var db = new SQL()) {
    db.parquet("training_data", "data/features.parquet");

    // Validate schema before proceeding to ML
    DataFrame schema = db.describe("training_data");
    System.out.println("Schema of training data:");
    System.out.println(schema);

    // Programmatically check expected columns exist
    String[] expectedColumns = {"age", "income", "credit_score", "label"};
    DataFrame tables = db.tables();
    // Use the schema DataFrame for validation logic...

    // Proceed with queries only after validation
    DataFrame features = db.query("""
        SELECT age, income, credit_score, label
        FROM training_data
        WHERE label IS NOT NULL
    """);
    double[][] X = features.select("age", "income", "credit_score").toArray();
}

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment