Implementation:Sktime Pytorch forecasting mLSTMLayer

Knowledge Sources	Sktime_Pytorch_forecasting
Domains	Time_Series, Forecasting, Deep_Learning
Last Updated	2026-02-08 08:00 GMT

Overview

mLSTMLayer stacks multiple mLSTM cells to form a deep recurrent layer with support for residual connections, layer normalization, and dropout.

Description

mLSTMLayer extends nn.Module and wraps multiple mLSTMCell instances into a multi-layer recurrent architecture. It processes input sequences time-step by time-step, passing data through each stacked cell. Residual connections can be enabled between layers (skipping the first layer), which helps gradient flow in deeper configurations. The layer manages hidden, cell, and normalized states across all stacked cells.

Usage

Use mLSTMLayer when you need a multi-layer mLSTM recurrent block as part of a forecasting or sequence modeling network. It is the intermediate building block between individual mLSTMCell instances and the complete mLSTMNetwork.

Code Reference

Source Location

Repository: Sktime_Pytorch_forecasting
File: pytorch_forecasting/layers/_recurrent/_mlstm/layer.py
Lines: 1-151

Signature

class mLSTMLayer(nn.Module):
    def __init__(
        self,
        input_size,
        hidden_size,
        num_layers,
        dropout=0.2,
        layer_norm=True,
        residual_conn=True,
    ):
    def forward(self, x, h=None, c=None, n=None):
    def init_hidden(self, batch_size, device=None):

Import

from pytorch_forecasting.layers._recurrent._mlstm.layer import mLSTMLayer

I/O Contract

Inputs

init

Name	Type	Required	Description
input_size	int	Yes	The number of features in the input.
hidden_size	int	Yes	The number of features in the hidden state.
num_layers	int	Yes	The number of mLSTM layers to stack.
dropout	float	No	Dropout probability applied to inputs and intermediate layers. Defaults to 0.2.
layer_norm	bool	No	Whether to use layer normalization in each mLSTM cell. Defaults to True.
residual_conn	bool	No	Whether to enable residual connections between layers. Defaults to True.

forward

Name	Type	Required	Description
x	torch.Tensor	Yes	Input tensor of shape (seq_len, batch_size, input_size). Internally transposed to (batch_size, seq_len, input_size).
h	torch.Tensor or None	No	Initial hidden states for all layers. If None, initialized to zeros.
c	torch.Tensor or None	No	Initial cell states for all layers. If None, initialized to zeros.
n	torch.Tensor or None	No	Initial normalized states for all layers. If None, initialized to zeros.

Outputs

forward

Name	Type	Description
output	torch.Tensor	Final output tensor from the last layer, of shape (seq_len, batch_size, hidden_size).
(h, c, n)	tuple of torch.Tensor	Final hidden, cell, and normalized states for all layers. Each of shape (num_layers, batch_size, hidden_size).

init_hidden

Name	Type	Description
(h, c, n)	tuple of torch.Tensor	Stacked zero-initialized hidden, cell, and normalization states for all layers.

Usage Examples

import torch
from pytorch_forecasting.layers._recurrent._mlstm.layer import mLSTMLayer

layer = mLSTMLayer(
    input_size=32,
    hidden_size=64,
    num_layers=3,
    dropout=0.1,
    layer_norm=True,
    residual_conn=True,
)

seq_len, batch_size = 10, 16
x = torch.randn(seq_len, batch_size, 32)
output, (h, c, n) = layer(x)
# output shape: (10, 16, 64)
# h shape: (3, 16, 64)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment