Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Pola rs Polars DataFrame Construction

From Leeroopedia


Knowledge Sources
Domains Data Engineering, DataFrame
Last Updated 2026-02-09 10:00 GMT

Overview

Concrete APIs for constructing DataFrames from CSV files, URLs, or Python dictionaries, with schema overrides and post-load type coercion for aggregation-ready data.

Description

Polars provides two primary pathways for DataFrame construction: reading from external sources (pl.read_csv(), pl.read_parquet(), etc.) and building from in-memory Python data structures (pl.DataFrame(data)). Both pathways support schema control -- either at construction time via schema_overrides or after construction via with_columns() and Expr.cast().

The schema_overrides parameter accepts a dictionary mapping column names to Polars data types. This is applied during parsing, avoiding the cost of creating a default-typed column and then converting it. Post-load transformations via with_columns() allow chaining arbitrary expressions, such as pl.col("birthday").str.to_date(strict=False) to parse date strings.

Usage

Use these APIs whenever you need to:

  • Read a CSV file with specific columns cast to pl.Categorical for efficient grouping.
  • Construct a test DataFrame from a Python dictionary.
  • Parse string date columns into proper Date types after loading.

Code Reference

Source Location

  • Repository: Polars
  • File: docs/source/src/python/user-guide/expressions/aggregation.py (lines 1-17)

Signature

# Read CSV with schema overrides
pl.read_csv(
    source: str | Path | IO[bytes],
    schema_overrides: dict[str, PolarsDataType] | None = None,
    # ... additional parameters
) -> DataFrame

# Construct from dictionary
pl.DataFrame(
    data: dict[str, list] | list[dict] | ...,
    schema: dict[str, PolarsDataType] | None = None,
    # ... additional parameters
) -> DataFrame

# Post-load type transformation
DataFrame.with_columns(
    *exprs: IntoExpr | Iterable[IntoExpr],
) -> DataFrame

# Type casting
Expr.cast(
    dtype: PolarsDataType,
    strict: bool = True,
) -> Expr

Import

import polars as pl

I/O Contract

Inputs

Name Type Required Description
source Path | IO[bytes] Yes (read_csv) File path, URL, or file-like object to read
schema_overrides dict[str, PolarsDataType] No Column name to target type mapping; applied during parsing
data dict[str, list] Yes (DataFrame) Dictionary mapping column names to value lists
*exprs IntoExpr Yes (with_columns) One or more expressions producing new or replacement columns
dtype PolarsDataType Yes (cast) Target data type (e.g., pl.Categorical, pl.Date)

Outputs

Name Type Description
result DataFrame A DataFrame with correctly typed columns, ready for grouping and aggregation

Usage Examples

Reading CSV with Schema Overrides

import polars as pl

url = "https://theunitedstates.io/congress-legislators/legislators-historical.csv"

schema_overrides = {
    "first_name": pl.Categorical,
    "gender": pl.Categorical,
    "state": pl.Categorical,
    "party": pl.Categorical,
}

dataset = pl.read_csv(url, schema_overrides=schema_overrides).with_columns(
    pl.col("birthday").str.to_date(strict=False)
)

Constructing from Dictionary

import polars as pl

df = pl.DataFrame(
    {
        "label": ["foo", "bar", "spam"],
        "a": [1, 2, 3],
        "b": [10, 20, 30],
    }
)

Post-Load Type Casting

import polars as pl

df = pl.read_csv("data.csv").with_columns(
    pl.col("category").cast(pl.Categorical),
    pl.col("date_str").str.to_date("%Y-%m-%d"),
)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment