Principle:Pola rs Polars SQL Data Registration
Overview
Registering data sources as named tables in the SQL context, making DataFrames, LazyFrames, file scans, and converted pandas DataFrames queryable via SQL table references. Registration is the mechanism by which heterogeneous data sources become addressable within the SQL query namespace.
Metadata
| Field | Value |
|---|---|
| Namespace | Pola_rs_Polars |
| Workflow | SQL_Query_Interface |
| Principle_ID | Pola_rs_Polars_SQL_Data_Registration |
| Type | Principle |
| Category | Data Access / Query Interface |
| Stage | Data Registration |
| last_updated | 2026-02-09 10:00 GMT |
| Source_Repository | https://github.com/pola-rs/polars |
| Documentation | https://docs.pola.rs |
Theoretical Basis
Database Table Binding
In relational database systems, table binding is the process of associating a logical table name with a physical data source. The SQL context's registration mechanism implements this concept by mapping string identifiers to Polars frame objects. Once bound, the table name serves as a stable reference that SQL queries use to locate and access the underlying data.
This binding is mutable: new tables can be registered at any point during the context's lifetime, and existing names can be re-bound to different data sources. This flexibility supports iterative workflows where intermediate results are registered as new tables for subsequent queries.
Federated Query Processing
The registration system supports federated query processing by allowing data from diverse sources to coexist in a single query namespace:
- In-memory DataFrames: Already materialized data resident in memory.
- LazyFrames: Deferred computation plans that are only executed when collected.
- File scans: Lazy references to on-disk data (CSV, NDJSON, Parquet) that benefit from scan-level optimizations like predicate pushdown and projection pushdown.
- Pandas conversions: DataFrames from the pandas ecosystem converted to Polars format.
By unifying these sources under a common registration interface, users can write SQL queries that join, filter, and aggregate across data origins without manually orchestrating data loading and format conversion.
Single vs Batch Registration
Registration supports both single operations (registering one table at a time) and batch operations (registering multiple tables in a single call). Batch registration reduces boilerplate and ensures atomic setup of related tables, which is particularly useful when initializing a context for a multi-table query workload.
Core Concepts
Name Resolution
When a SQL query references a table name, the SQL context resolves that name against its internal catalog. If the name is not found, the query fails with an error. Registration ensures that all table names referenced in queries have valid bindings before execution.
Implicit Conversion
DataFrames registered in the context are implicitly converted to LazyFrames for query planning. This means all query optimization passes apply uniformly regardless of whether the original source was eager or lazy. The conversion is lightweight and does not copy data.
File Scan Registration
Registering a file scan (e.g., via pl.scan_csv or pl.scan_ndjson) as a table is a powerful pattern because:
- The file is not read into memory at registration time.
- Query predicates can be pushed down to the scan level, reading only necessary rows and columns.
- Multiple queries against the same file scan share the optimized scan plan.
I/O Contract
| Direction | Type | Description |
|---|---|---|
| Input | str (name) | The table name to register in the SQL catalog |
| Input | DataFrame / LazyFrame | The Polars frame to bind to the table name |
| Input | pandas.DataFrame | External data converted via pl.from_pandas() |
| Input | LazyFrame (scan) | File scans created via pl.scan_csv(), pl.scan_ndjson(), etc. |
| Output | None | Registration is a side-effect operation on the SQLContext |
Relationships
See Also
- Principle:Pola_rs_Polars_SQL_Context_Creation — Creating the SQL context that holds registered tables
- Principle:Pola_rs_Polars_SQL_Query_Execution — Executing SQL queries against registered tables
- Principle:Pola_rs_Polars_Advanced_SQL_Features — Creating new tables via SQL DDL
- Principle:Pola_rs_Polars_SQL_Result_Collection — Materializing query results