Workflow:Eventual Inc Daft SQL Query Analytics

Knowledge Sources	Daft Daft Docs SQL Reference Sessions and Catalogs
Domains	Data_Engineering, Analytics, SQL
Last Updated	2026-02-08 14:00 GMT

Overview

End-to-end process for running SQL-based analytical queries on data from multiple catalogs and table formats using Daft's built-in SQL engine with session and catalog management.

Description

This workflow covers using Daft as a SQL query engine for interactive analytics and data exploration. Daft provides a Rust-based SQL parser and planner that translates SQL queries into optimized Daft execution plans. Users register DataFrames or external catalog tables, then query them using standard SQL syntax including SELECT, JOIN, GROUP BY, HAVING, ORDER BY, window functions, and subqueries. The session system manages catalog state (Iceberg, Glue, Unity, Gravitino, PostgreSQL, Memory) and namespace scoping, allowing SQL queries to reference tables across multiple catalogs. SQL table functions (read_parquet, read_csv, read_json, read_deltalake) enable querying files directly without prior registration.

Usage

Execute this workflow when you prefer SQL syntax for data analysis or need to integrate Daft with SQL-based tooling. Typical triggers include:

Interactive data exploration using familiar SQL syntax
Querying across multiple catalog sources (Iceberg + PostgreSQL + in-memory) in a single session
Building analytical dashboards or reports from lakehouse tables
Running TPC-H or similar analytical benchmark queries
Migrating existing SQL-based pipelines to Daft's execution engine
Combining SQL queries with DataFrame API operations in a single pipeline

Execution Steps

Step 1: Session and Catalog Setup

Initialize a Daft session and attach the external catalogs that contain the tables you want to query. Catalogs provide namespace-scoped table discovery and metadata management. Daft supports Iceberg REST catalogs, AWS Glue, Unity Catalog, Apache Gravitino, PostgreSQL, S3 Tables, and in-memory catalogs.

Key considerations:

Use daft.Session() to create an isolated session or daft.current_session() for the default
Attach catalogs with daft.attach_catalog() or session.attach_catalog()
Set the active catalog and namespace with set_catalog() and set_namespace()
For quick ad-hoc analysis, register DataFrames directly with create_temp_table()
Catalogs are identified by name and can be referenced in SQL with dot notation (catalog.namespace.table)

Step 2: Table Registration

Make data sources available to the SQL engine by registering them as named tables. Tables can come from attached catalogs (auto-discovered), from DataFrames registered as temporary tables, or from SQL table functions that read files on the fly.

Key considerations:

Catalog tables are automatically available after catalog attachment
Use create_temp_table("name", dataframe) to register a DataFrame for SQL access
Use create_table("name", dataframe) for persistent table creation in writable catalogs
Use SHOW TABLES to list available tables in the current namespace
Use fully qualified names (catalog.namespace.table) to query across catalogs

Step 3: SQL Query Composition

Write SQL queries using Daft's supported SQL syntax. Daft supports standard SELECT statements with projections, filters (WHERE), aggregations (GROUP BY, HAVING), joins (INNER, LEFT, RIGHT, FULL, CROSS, SEMI, ANTI), set operations (UNION, INTERSECT, EXCEPT), subqueries, window functions (ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, running aggregates), and ORDER BY with LIMIT/OFFSET.

Key considerations:

Use daft.sql("SELECT ...") to execute queries and get a DataFrame result
SQL table functions like read_parquet(), read_csv(), read_json() allow querying files directly
Window functions support PARTITION BY, ORDER BY, and frame specifications (ROWS, RANGE)
NULL handling follows SQL semantics with IS NULL, IS NOT NULL, COALESCE, and null-safe equals (<=>)
User-defined functions registered with attach_function() are callable from SQL
Use sql_expr() to embed SQL expressions within DataFrame API calls

Step 4: Query Execution and Result Inspection

Execute the SQL query and inspect results. The sql() function returns a lazy DataFrame that can be further transformed with the DataFrame API, materialized with collect(), or displayed with show(). Use explain() to review the optimized query plan.

Key considerations:

daft.sql() returns a DataFrame (lazy until materialized)
Chain DataFrame operations after sql() for hybrid SQL-DataFrame workflows
Use show(N) to preview the first N rows without full materialization
Use explain() to see how the SQL query was optimized
Use collect() to materialize the full result into memory

Step 5: Result Export

Export query results to the desired format. Write to lakehouse tables (Iceberg, Delta Lake), file formats (Parquet, CSV, JSON), or convert to in-memory representations (Pandas, Arrow, Python lists).

Key considerations:

Use write_parquet() or write_csv() for file-based output
Use write_iceberg() or write_deltalake() for lakehouse table output
Use to_pandas() or to_arrow() for in-memory conversion
Use to_pylist() or to_pydict() for Python-native data structures
Use write_table() with session catalogs for catalog-managed writes

Execution Diagram

GitHub URL

Workflow Repository