Implementation:Pola rs Polars DataFrame Construction
| Knowledge Sources | |
|---|---|
| Domains | Data Engineering, DataFrame |
| Last Updated | 2026-02-09 10:00 GMT |
Overview
Concrete APIs for constructing DataFrames from CSV files, URLs, or Python dictionaries, with schema overrides and post-load type coercion for aggregation-ready data.
Description
Polars provides two primary pathways for DataFrame construction: reading from external sources (pl.read_csv(), pl.read_parquet(), etc.) and building from in-memory Python data structures (pl.DataFrame(data)). Both pathways support schema control -- either at construction time via schema_overrides or after construction via with_columns() and Expr.cast().
The schema_overrides parameter accepts a dictionary mapping column names to Polars data types. This is applied during parsing, avoiding the cost of creating a default-typed column and then converting it. Post-load transformations via with_columns() allow chaining arbitrary expressions, such as pl.col("birthday").str.to_date(strict=False) to parse date strings.
Usage
Use these APIs whenever you need to:
- Read a CSV file with specific columns cast to
pl.Categoricalfor efficient grouping. - Construct a test DataFrame from a Python dictionary.
- Parse string date columns into proper
Datetypes after loading.
Code Reference
Source Location
- Repository: Polars
- File:
docs/source/src/python/user-guide/expressions/aggregation.py(lines 1-17)
Signature
# Read CSV with schema overrides
pl.read_csv(
source: str | Path | IO[bytes],
schema_overrides: dict[str, PolarsDataType] | None = None,
# ... additional parameters
) -> DataFrame
# Construct from dictionary
pl.DataFrame(
data: dict[str, list] | list[dict] | ...,
schema: dict[str, PolarsDataType] | None = None,
# ... additional parameters
) -> DataFrame
# Post-load type transformation
DataFrame.with_columns(
*exprs: IntoExpr | Iterable[IntoExpr],
) -> DataFrame
# Type casting
Expr.cast(
dtype: PolarsDataType,
strict: bool = True,
) -> Expr
Import
import polars as pl
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| source | Path | IO[bytes] | Yes (read_csv) | File path, URL, or file-like object to read |
| schema_overrides | dict[str, PolarsDataType] |
No | Column name to target type mapping; applied during parsing |
| data | dict[str, list] |
Yes (DataFrame) | Dictionary mapping column names to value lists |
| *exprs | IntoExpr |
Yes (with_columns) | One or more expressions producing new or replacement columns |
| dtype | PolarsDataType |
Yes (cast) | Target data type (e.g., pl.Categorical, pl.Date)
|
Outputs
| Name | Type | Description |
|---|---|---|
| result | DataFrame |
A DataFrame with correctly typed columns, ready for grouping and aggregation |
Usage Examples
Reading CSV with Schema Overrides
import polars as pl
url = "https://theunitedstates.io/congress-legislators/legislators-historical.csv"
schema_overrides = {
"first_name": pl.Categorical,
"gender": pl.Categorical,
"state": pl.Categorical,
"party": pl.Categorical,
}
dataset = pl.read_csv(url, schema_overrides=schema_overrides).with_columns(
pl.col("birthday").str.to_date(strict=False)
)
Constructing from Dictionary
import polars as pl
df = pl.DataFrame(
{
"label": ["foo", "bar", "spam"],
"a": [1, 2, 3],
"b": [10, 20, 30],
}
)
Post-Load Type Casting
import polars as pl
df = pl.read_csv("data.csv").with_columns(
pl.col("category").cast(pl.Categorical),
pl.col("date_str").str.to_date("%Y-%m-%d"),
)