Principle:Sktime Pytorch forecasting Hyperparameter Optimization
| Knowledge Sources | |
|---|---|
| Domains | Deep_Learning, Hyperparameter_Tuning, AutoML |
| Last Updated | 2026-02-08 07:00 GMT |
Overview
Technique for automatically searching the hyperparameter space of a forecasting model using Bayesian optimization with early stopping to find optimal configurations.
Description
Hyperparameter Optimization automates the search for the best model configuration by systematically exploring combinations of hyperparameters (hidden size, dropout, learning rate, gradient clipping, etc.) using the Optuna framework. Each trial trains a model with a sampled configuration, optionally uses LR finding to set the learning rate, and evaluates on validation data. Optuna's Tree-structured Parzen Estimator (TPE) sampler learns from completed trials to focus on promising regions of the search space. The MedianPruner early-stops unpromising trials by comparing their intermediate validation loss to the median of completed trials. This dramatically reduces the total compute budget needed to find good hyperparameters.
Usage
Use this principle when manual tuning of TFT hyperparameters is impractical. The optimize_hyperparameters function takes pre-built DataLoaders and search space ranges, and returns an Optuna Study containing all trial results. The best parameters can be extracted with study.best_trial.params and used to create the final production model. This is specifically designed for the Temporal Fusion Transformer.
Theoretical Basis
Tree-structured Parzen Estimator (TPE):
TPE models the search space as two distributions:
Where is a quantile threshold on observed losses, models the "good" hyperparameter region, and models the rest. The expected improvement is proportional to .
Median Pruning: A trial is pruned at step if: Failed to parse (syntax error): {\displaystyle \text{val\_loss}_t > \text{median}(\{\text{val\_loss}_{t,i}\}_{i \in \text{completed}}) }
Pseudo-code:
# Abstract hyperparameter optimization
study = create_study(direction="minimize")
for trial_idx in range(n_trials):
params = sample_hyperparameters(study, search_space)
model = create_model(dataset, **params)
if use_lr_finder:
params["learning_rate"] = find_lr(model, data)
train(model, data, max_epochs=20)
val_loss = evaluate(model)
study.report(trial_idx, val_loss)
best_params = study.best_trial.params