Principle:Sktime Pytorch forecasting Hyperparameter Optimization

Knowledge Sources	Optuna Optuna Documentation pytorch-forecasting
Domains	Deep_Learning, Hyperparameter_Tuning, AutoML
Last Updated	2026-02-08 07:00 GMT

Overview

Technique for automatically searching the hyperparameter space of a forecasting model using Bayesian optimization with early stopping to find optimal configurations.

Description

Hyperparameter Optimization automates the search for the best model configuration by systematically exploring combinations of hyperparameters (hidden size, dropout, learning rate, gradient clipping, etc.) using the Optuna framework. Each trial trains a model with a sampled configuration, optionally uses LR finding to set the learning rate, and evaluates on validation data. Optuna's Tree-structured Parzen Estimator (TPE) sampler learns from completed trials to focus on promising regions of the search space. The MedianPruner early-stops unpromising trials by comparing their intermediate validation loss to the median of completed trials. This dramatically reduces the total compute budget needed to find good hyperparameters.

Usage

Use this principle when manual tuning of TFT hyperparameters is impractical. The optimize_hyperparameters function takes pre-built DataLoaders and search space ranges, and returns an Optuna Study containing all trial results. The best parameters can be extracted with study.best_trial.params and used to create the final production model. This is specifically designed for the Temporal Fusion Transformer.

Theoretical Basis

Tree-structured Parzen Estimator (TPE):

TPE models the search space as two distributions: $p (x | y) = {\begin{cases} l (x) & if y < y^{*} \\ g (x) & if y \geq y^{*} \end{cases}$

Where $y^{*}$ is a quantile threshold on observed losses, $l (x)$ models the "good" hyperparameter region, and $g (x)$ models the rest. The expected improvement is proportional to $l (x) / g (x)$ .

Median Pruning: A trial is pruned at step $t$ if: Failed to parse (syntax error): {\displaystyle \text{val\_loss}_t > \text{median}(\{\text{val\_loss}_{t,i}\}_{i \in \text{completed}}) }

Pseudo-code:

# Abstract hyperparameter optimization
study = create_study(direction="minimize")
for trial_idx in range(n_trials):
    params = sample_hyperparameters(study, search_space)
    model = create_model(dataset, **params)
    if use_lr_finder:
        params["learning_rate"] = find_lr(model, data)
    train(model, data, max_epochs=20)
    val_loss = evaluate(model)
    study.report(trial_idx, val_loss)

best_params = study.best_trial.params

Related Pages

Implemented By

Implementation:Sktime_Pytorch_forecasting_Optimize_Hyperparameters

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment