Implementation:Haifengl Smile SQL Metadata API
Overview
The SQL Metadata API provides three methods on smile.data.SQL for inspecting the database catalog: tables() lists all tables, describe() shows the columns of a specific table, and extensions() reports installed DuckDB extensions. Additionally, installExtension(), loadExtension(), and updateExtensions() manage the extension lifecycle. All inspection methods return DataFrame objects for easy display and further analysis.
API Signature
public class SQL implements AutoCloseable {
/**
* Returns the tables in the database.
* @return a DataFrame with TABLE_NAME and REMARKS columns.
* @throws SQLException if fail to query metadata.
*/
public DataFrame tables() throws SQLException
/**
* Returns the columns of a specific table.
* @param table the table name.
* @return a DataFrame with COLUMN_NAME, TYPE_NAME, and IS_NULLABLE columns.
* @throws SQLException if fail to query metadata.
*/
public DataFrame describe(String table) throws SQLException
/**
* Returns the installed DuckDB extensions.
* @return a DataFrame with name, loaded, description, version, and mode columns.
* @throws SQLException if fail to execute the query.
*/
public DataFrame extensions() throws SQLException
/**
* Updates all installed extensions to their latest versions.
* @return this object (for chaining).
* @throws SQLException if fail to update extensions.
*/
public SQL updateExtensions() throws SQLException
/**
* Installs a DuckDB extension.
* @param name the extension name.
* @return this object (for chaining).
* @throws SQLException if fail to install the extension.
*/
public SQL installExtension(String name) throws SQLException
/**
* Loads an installed DuckDB extension into the current session.
* @param name the extension name.
* @return this object (for chaining).
* @throws SQLException if fail to load the extension.
*/
public SQL loadExtension(String name) throws SQLException
}
Source
base/src/main/java/smile/data/SQL.java:L77-144
Extension management methods are at lines 90-117.
Import
import smile.data.SQL;
import smile.data.DataFrame;
Type
API Doc
Inputs
| Method | Parameter | Type | Required | Description |
|---|---|---|---|---|
tables() |
(none) | -- | -- | Lists all tables in the current database. |
describe() |
table |
String |
Yes | The name of the table to describe. |
extensions() |
(none) | -- | -- | Lists all installed DuckDB extensions. |
installExtension() |
name |
String |
Yes | The extension name to install. |
loadExtension() |
name |
String |
Yes | The extension name to load. |
updateExtensions() |
(none) | -- | -- | Updates all installed extensions. |
Outputs
tables()
Returns a DataFrame with the following columns:
| Column | Type | Description |
|---|---|---|
TABLE_NAME |
String |
The name of each table in the database. |
REMARKS |
String |
Optional table comments/description. |
describe(table)
Returns a DataFrame with the following columns:
| Column | Type | Description |
|---|---|---|
COLUMN_NAME |
String |
The name of each column in the table. |
TYPE_NAME |
String |
The DuckDB data type (e.g., INTEGER, VARCHAR, DOUBLE).
|
IS_NULLABLE |
String |
Whether the column allows NULL values ("YES" or "NO").
|
extensions()
Returns a DataFrame with the following columns:
| Column | Type | Description |
|---|---|---|
name |
String |
Extension name (e.g., parquet, httpfs, iceberg).
|
loaded |
Boolean |
Whether the extension is currently loaded in the session. |
description |
String |
Human-readable description of the extension's purpose. |
version |
String |
Installed version string. |
mode |
String |
Installation mode (e.g., AUTOLOADED, CUSTOM).
|
Internal Mechanism
tables() and describe()
These methods use the standard JDBC DatabaseMetaData interface:
// tables()
DatabaseMetaData meta = db.getMetaData();
ResultSet rs = meta.getTables(null, null, null, null);
DataFrame df = DataFrame.of(rs);
return df.select("TABLE_NAME", "REMARKS");
// describe(table)
DatabaseMetaData meta = db.getMetaData();
ResultSet rs = meta.getColumns(null, null, table, null);
DataFrame df = DataFrame.of(rs);
return df.select("COLUMN_NAME", "TYPE_NAME", "IS_NULLABLE");
The full JDBC metadata ResultSet contains many columns (TABLE_CAT, TABLE_SCHEM, etc.), but the methods select only the most relevant columns for a clean, focused output.
extensions()
This method executes a DuckDB-specific SQL query against the duckdb_extensions() table function:
return query("""
SELECT extension_name AS name, loaded, description,
extension_version AS version, install_mode AS mode
FROM duckdb_extensions() WHERE installed = true;""");
Extension Management
The extension management methods execute DDL statements directly:
installExtension(name): ExecutesINSTALL {name};loadExtension(name): ExecutesLOAD {name};updateExtensions(): ExecutesUPDATE EXTENSIONS;
All three return this for method chaining.
Usage Examples
Listing All Tables
import smile.data.SQL;
import smile.data.DataFrame;
try (var db = new SQL()) {
db.csv("customers", "data/customers.csv")
.parquet("orders", "data/orders.parquet")
.csv("products", "data/products.csv");
// List all loaded tables
DataFrame tables = db.tables();
System.out.println(tables);
// Output:
// +----------+--------+
// |TABLE_NAME|REMARKS |
// +----------+--------+
// | customers| |
// | orders| |
// | products| |
// +----------+--------+
}
Describing Table Schema
import smile.data.SQL;
import smile.data.DataFrame;
try (var db = new SQL()) {
db.csv("iris", "data/iris.csv");
// Inspect the table schema
DataFrame schema = db.describe("iris");
System.out.println(schema);
// Output:
// +-----------+---------+-----------+
// |COLUMN_NAME|TYPE_NAME|IS_NULLABLE|
// +-----------+---------+-----------+
// |sepal_length| DOUBLE| YES|
// |sepal_width | DOUBLE| YES|
// |petal_length| DOUBLE| YES|
// |petal_width | DOUBLE| YES|
// |species | VARCHAR| YES|
// +-----------+---------+-----------+
}
Checking and Installing Extensions
import smile.data.SQL;
import smile.data.DataFrame;
try (var db = new SQL()) {
// View currently installed extensions
DataFrame ext = db.extensions();
System.out.println(ext);
// Install and load the Iceberg extension
db.installExtension("iceberg")
.loadExtension("iceberg");
// Verify the extension is loaded
ext = db.extensions();
System.out.println(ext);
}
Schema Validation in an ML Pipeline
import smile.data.SQL;
import smile.data.DataFrame;
try (var db = new SQL()) {
db.parquet("training_data", "data/features.parquet");
// Validate schema before proceeding to ML
DataFrame schema = db.describe("training_data");
System.out.println("Schema of training data:");
System.out.println(schema);
// Programmatically check expected columns exist
String[] expectedColumns = {"age", "income", "credit_score", "label"};
DataFrame tables = db.tables();
// Use the schema DataFrame for validation logic...
// Proceed with queries only after validation
DataFrame features = db.query("""
SELECT age, income, credit_score, label
FROM training_data
WHERE label IS NOT NULL
""");
double[][] X = features.select("age", "income", "credit_score").toArray();
}
Related
- Principle:Haifengl_Smile_SQL_Metadata_Exploration - The theoretical basis for metadata inspection.
- Implementation:Haifengl_Smile_SQL_Constructor - Metadata methods require an active SQL connection.
- Implementation:Haifengl_Smile_SQL_Data_Loaders - Metadata exploration is typically performed after loading data.
- Implementation:Haifengl_Smile_SQL_Query_API - Schema knowledge informs query construction.