Implementation:Pola rs Polars DataFrame Construction

Knowledge Sources	Polars Polars Docs
Domains	Data Engineering, DataFrame
Last Updated	2026-02-09 10:00 GMT

Overview

Concrete APIs for constructing DataFrames from CSV files, URLs, or Python dictionaries, with schema overrides and post-load type coercion for aggregation-ready data.

Description

Polars provides two primary pathways for DataFrame construction: reading from external sources (pl.read_csv(), pl.read_parquet(), etc.) and building from in-memory Python data structures (pl.DataFrame(data)). Both pathways support schema control -- either at construction time via schema_overrides or after construction via with_columns() and Expr.cast().

The schema_overrides parameter accepts a dictionary mapping column names to Polars data types. This is applied during parsing, avoiding the cost of creating a default-typed column and then converting it. Post-load transformations via with_columns() allow chaining arbitrary expressions, such as pl.col("birthday").str.to_date(strict=False) to parse date strings.

Usage

Use these APIs whenever you need to:

Read a CSV file with specific columns cast to pl.Categorical for efficient grouping.
Construct a test DataFrame from a Python dictionary.
Parse string date columns into proper Date types after loading.

Code Reference

Source Location

Repository: Polars
File: docs/source/src/python/user-guide/expressions/aggregation.py (lines 1-17)

Signature

# Read CSV with schema overrides
pl.read_csv(
    source: str | Path | IO[bytes],
    schema_overrides: dict[str, PolarsDataType] | None = None,
    # ... additional parameters
) -> DataFrame

# Construct from dictionary
pl.DataFrame(
    data: dict[str, list] | list[dict] | ...,
    schema: dict[str, PolarsDataType] | None = None,
    # ... additional parameters
) -> DataFrame

# Post-load type transformation
DataFrame.with_columns(
    *exprs: IntoExpr | Iterable[IntoExpr],
) -> DataFrame

# Type casting
Expr.cast(
    dtype: PolarsDataType,
    strict: bool = True,
) -> Expr

Import

import polars as pl

I/O Contract

Inputs

Name	Type	Required	Description
source	Path \| IO[bytes]	Yes (read_csv)	File path, URL, or file-like object to read
schema_overrides	`dict[str, PolarsDataType]`	No	Column name to target type mapping; applied during parsing
data	`dict[str, list]`	Yes (DataFrame)	Dictionary mapping column names to value lists
*exprs	`IntoExpr`	Yes (with_columns)	One or more expressions producing new or replacement columns
dtype	`PolarsDataType`	Yes (cast)	Target data type (e.g., `pl.Categorical`, `pl.Date`)

Outputs

Name	Type	Description
result	`DataFrame`	A DataFrame with correctly typed columns, ready for grouping and aggregation

Usage Examples

Reading CSV with Schema Overrides

import polars as pl

url = "https://theunitedstates.io/congress-legislators/legislators-historical.csv"

schema_overrides = {
    "first_name": pl.Categorical,
    "gender": pl.Categorical,
    "state": pl.Categorical,
    "party": pl.Categorical,
}

dataset = pl.read_csv(url, schema_overrides=schema_overrides).with_columns(
    pl.col("birthday").str.to_date(strict=False)
)

Constructing from Dictionary

import polars as pl

df = pl.DataFrame(
    {
        "label": ["foo", "bar", "spam"],
        "a": [1, 2, 3],
        "b": [10, 20, 30],
    }
)

Post-Load Type Casting

import polars as pl

df = pl.read_csv("data.csv").with_columns(
    pl.col("category").cast(pl.Categorical),
    pl.col("date_str").str.to_date("%Y-%m-%d"),
)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment