Principle:Shiyu coder Kronos Batch Forecasting

Field	Value
principle_name	Batch_Forecasting
repo	Shiyu_coder_Kronos
domains	Time_Series, Financial_Forecasting, Batch_Processing
last_updated	2026-02-09 14:00 GMT
implemented_by	Implementation:Shiyu_coder_Kronos_KronosPredictor_Predict_Batch

Summary

GPU-parallel generation of candlestick forecasts across multiple time series simultaneously with consistent input dimensions.

Concept

Batch forecasting extends the single-series prediction pipeline to process multiple financial time series in parallel within a single GPU kernel execution. Instead of iterating over series one at a time, all series are stacked into a single batch tensor and processed simultaneously through the tokenization, generation, and decoding stages.

This approach is critical for production use cases where forecasts are needed for many instruments (e.g., hundreds of stocks) within tight latency constraints. GPU parallelism provides significant speedup compared to sequential per-series prediction.

Theory

The batch forecasting pipeline follows the same five stages as single-series forecasting, but operates on stacked tensors:

Input: N series, each with their own DataFrame + timestamps
    |
    v
Per-series normalization (each series uses its own mean/std)
    |
    v
Stack into batch tensor: (N, seq_len, features)
    |
    v
Batch tokenization: tokenizer.encode() on full batch
    |
    v
Batch autoregressive generation: auto_regressive_inference()
    |  (all N series generate tokens in parallel)
    v
Batch decoding: tokenizer.decode() on full batch
    |
    v
Per-series denormalization (restore each series' original scale)
    |
    v
Output: N DataFrames, one per series

Key Constraint: Uniform Dimensions

For batch processing to work, all series must have:

Same historical length (number of input timesteps).
Same prediction length (pred_len).

This is a fundamental requirement because the batch tensor must be rectangular (all rows have the same number of columns). If series have different lengths, they cannot be stacked into a single tensor without padding, which is not supported by this implementation.

Per-Series Normalization

Even though the series are batched together for GPU execution, normalization remains per-series:

Each series is independently normalized using its own mean and standard deviation.
After generation, each series is independently denormalized back to its own price scale.

This ensures that a $10 stock and a $10,000 stock in the same batch receive appropriate treatment despite vastly different absolute values.

Memory Considerations

Batch GPU memory usage scales as:

memory ~ N_series * sample_count * max_context * d_model

Where N_series is the batch size, sample_count is the number of parallel samples per series, and max_context * d_model is the per-sequence memory footprint.

Source

Repository: Kronos on GitHub

Domains

Time_Series: Parallel sequential data forecasting.
Financial_Forecasting: Multi-instrument candlestick prediction.
Batch_Processing: GPU-efficient parallel execution of the inference pipeline.

Related Principles

Principle:Shiyu_coder_Kronos_Single_Series_Forecasting - The single-series version of this pipeline.
Principle:Shiyu_coder_Kronos_Autoregressive_Token_Generation - The core generation loop used inside batch forecasting.
Principle:Shiyu_coder_Kronos_Predictor_Initialization - Setting up the predictor that runs this pipeline.

Heuristic Links

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment