Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Shiyu coder Kronos Qlib Test Inference

From Leeroopedia


Field Value
principle_name Qlib_Test_Inference
repository https://github.com/shiyu-coder/Kronos
domains Inference, Batch_Processing, Financial_Forecasting
implemented_by Implementation:Shiyu_coder_Kronos_Generate_Predictions_Qlib
last_updated 2026-02-09 14:00 GMT

Summary

Batch inference over an entire test dataset to generate trading signal predictions for backtesting evaluation.

Concept

The Qlib Test Inference principle describes the end-to-end process of converting fine-tuned models into actionable trading signals. This is the bridge between model training and backtesting evaluation, where the model produces quantitative predictions for every symbol on every day in the test period.

Theory

The inference pipeline follows a structured sequence:

Model Loading

Both the fine-tuned tokenizer and predictor are loaded and set to eval mode. The tokenizer converts continuous inputs to discrete tokens, and the predictor generates future token predictions autoregressively.

Sequential Sliding Window Dataset

Unlike training (which uses random sampling), the test dataset (QlibTestDataset) iterates sequentially through all valid sliding windows. For each window, it yields:

  • Context features (lookback window)
  • Context time stamps
  • Future time stamps (for the prediction horizon)
  • Symbol name and timestamp metadata (for mapping predictions back to the calendar)

The metadata is critical because predictions must be associated with specific (datetime, symbol) pairs for portfolio construction.

Autoregressive Inference

For each batch, the pipeline uses auto_regressive_inference() to generate predictions:

  • The tokenizer encodes the context window into discrete tokens
  • The predictor generates future tokens autoregressively, one step at a time
  • Multiple samples are drawn (controlled by inference_sample_count) and averaged to reduce variance
  • Temperature (inference_T) and nucleus sampling (inference_top_p) control generation diversity

Signal Extraction

From the raw multi-feature predictions, trading signals are extracted by computing the close price delta (predicted close minus last observed close). Four signal variants are computed:

  • mean: Average predicted close across the prediction horizon
  • last: Predicted close on the final prediction day
  • max: Maximum predicted close across the horizon
  • min: Minimum predicted close across the horizon

Each variant captures a different aspect of the price trajectory and may perform differently in backtesting.

Pivot to DataFrames

The per-sample (timestamp, symbol, score) records are pivoted into DataFrames with:

  • Index: datetime
  • Columns: symbol names
  • Values: prediction scores

This format is directly consumable by the backtesting framework.

Key Design Decisions

  • Custom collate function: Required because each batch contains a mix of tensors (features), strings (symbols), and Timestamp objects that cannot be handled by PyTorch's default collation
  • Batch size adjustment: The DataLoader batch size is divided by sample_count because auto_regressive_inference internally expands each sample by the sample count
  • Instance normalization: Applied identically to training to ensure consistency

Domains

  • Inference: Batch model prediction generation
  • Batch_Processing: Efficient DataLoader-based processing of large test sets
  • Financial_Forecasting: Time series price prediction for trading signals

See Also

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment