Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Fastai Fastbook Feature Engineering

From Leeroopedia
Revision as of 17:11, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Fastai_Fastbook_Feature_Engineering.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Feature Engineering, Tabular Data, Time Series
Last Updated 2026-02-09 17:00 GMT

Overview

Feature engineering is the process of transforming raw data columns into new, more informative representations that enable machine learning models to learn patterns more effectively.

Description

Raw datasets often contain columns whose values are not directly amenable to the splitting or gradient-based operations used by machine learning algorithms. Feature engineering bridges this gap by creating derived columns that expose latent structure. A particularly important case is date feature extraction: a single date column (e.g., "2011-03-15") encodes many distinct pieces of information -- year, month, day of week, whether it falls on a holiday, whether it is a month-end, and so on. A decision tree cannot efficiently discover these patterns from a raw date because it can only perform binary splits on a single ordinal value. By decomposing the date into its constituent temporal features, we give the model direct access to each dimension of temporal variation.

The general principle extends beyond dates to any column where domain knowledge suggests that derived features would be more informative than the raw value. Examples include:

  • Polynomial features: Creating interaction terms or powers of numeric columns.
  • Binning: Converting continuous values into categorical bins.
  • Text extraction: Pulling structured fields (e.g., domain name from a URL).
  • Temporal decomposition: Splitting a date into year, month, week, day, day-of-week, day-of-year, and boolean flags for month-start, month-end, quarter-start, quarter-end, year-start, year-end, plus an elapsed-time numeric value.

Usage

Apply feature engineering whenever:

  • The dataset contains date or timestamp columns and the model cannot natively reason about temporal patterns (decision trees, random forests, most neural networks).
  • Domain knowledge suggests that a raw column encodes multiple independent signals.
  • Initial modeling reveals poor performance that could be addressed by providing the model with richer input features.
  • You want to enable a tree-based model to capture cyclical or calendar effects.

Theoretical Basis

Date decomposition is motivated by the structure of decision trees. A decision tree partitions data by choosing a feature and a threshold that best separates the target variable. Given a raw date represented as an integer (e.g., days since epoch), the tree can only split on "before vs. after" a single date. This is insufficient to capture:

  • Cyclical patterns: Sales may be higher on weekends regardless of the year. A "day of week" feature directly encodes this.
  • Seasonal patterns: Demand may peak in certain months. A "month" feature captures this.
  • Trend: Prices may increase year over year. A "year" feature isolates this.

By decomposing a single date column into N temporal features, we transform a one-dimensional input into an N-dimensional input, allowing the tree to split on each dimension independently. The standard decomposition produces 13 features:

Feature Type Description
Year int Calendar year (e.g., 2011)
Month int Month of the year (1-12)
Week int ISO week number (1-53)
Day int Day of the month (1-31)
Dayofweek int Day of the week (0=Monday, 6=Sunday)
Dayofyear int Day of the year (1-366)
Is_month_end bool Whether the date is the last day of the month
Is_month_start bool Whether the date is the first day of the month
Is_quarter_end bool Whether the date is the last day of a quarter
Is_quarter_start bool Whether the date is the first day of a quarter
Is_year_end bool Whether the date is the last day of the year
Is_year_start bool Whether the date is the first day of the year
Elapsed float Seconds since Unix epoch (continuous numeric)

The Elapsed feature is particularly important because it preserves the monotonic ordering of time, enabling the model to capture long-term trends. The boolean flags and cyclical integer features enable the model to capture calendar-based periodic effects.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment