Implementation:Turboderp org Exllamav2 ExLlamaV2LayerNorm

Knowledge Sources	Turboderp_org_Exllamav2
Domains	Normalization, Model_Architecture
Last Updated	2026-02-15 00:00 GMT

Overview

Standard layer normalization module that applies full-dimension LayerNorm to hidden states, with a CUDA-accelerated kernel and a pure-PyTorch fallback path.

Description

ExLlamaV2LayerNorm is a subclass of ExLlamaV2Module that implements standard LayerNorm across the full hidden dimension. It is used by transformer architectures that specify LayerNorm (as opposed to RMSNorm) for their normalization layers.

Key components:

__init__(model, key, archparams) -- Initialises the module with a default variance_epsilon of 1e-6. The weight and bias are set to None until load() is called.
load() -- Loads the normalization weight (and optional bias) from safetensors. Creates an nn.LayerNorm instance with elementwise_affine=True. Updates variance_epsilon from model.config.norm_eps to use the model-specific value.
unload() -- Releases the layernorm, weight, and bias tensors.
get_weight() -- Returns the weight tensor, or a tuple (weight, bias) if bias is present.
weight_footprint() -- Returns hidden_size * 2 bytes as the memory footprint estimate.
forward(hidden_states, ...) -- The primary forward path. Reshapes input to 2D, calls ext_c.layer_norm() CUDA kernel with the weight, optional bias, and variance epsilon, then reshapes back to the original shape.
forward_torch(hidden_states, ...) -- Pure PyTorch fallback that delegates to self.layernorm() (the standard nn.LayerNorm module).

Both forward paths accept an intermediates flag; when True, they return a dictionary with a hidden_states key.

Usage

Use ExLlamaV2LayerNorm as a normalization layer within transformer blocks that require standard LayerNorm. It is instantiated automatically by the model architecture loader. Users do not typically create this module directly.

Code Reference

Source Location

Repository: Turboderp_org_Exllamav2
File: exllamav2/layernorm.py
Lines: L1-147

Signature

class ExLlamaV2LayerNorm(ExLlamaV2Module):

    name: str = "LayerNorm"
    layernorm: nn.LayerNorm | None
    weight: nn.Parameter | None
    bias: nn.Parameter | None
    variance_epsilon: float

    def __init__(
        self,
        model: ExLlamaV2,
        key: str,
        archparams=None
    ):
        ...

    def load(self) -> None:
        ...

    def unload(self) -> None:
        ...

    def get_weight(self) -> torch.Tensor | tuple[torch.Tensor, torch.Tensor]:
        ...

    def weight_footprint(self) -> int:
        ...

    def forward(
        self,
        hidden_states: torch.Tensor,
        cache=None,
        attn_params=None,
        past_len=None,
        intermediates: bool = False,
        loras=None,
        output_fp32=False,
        **kwargs
    ) -> torch.Tensor | dict[str, torch.Tensor]:
        ...

    def forward_torch(
        self,
        hidden_states: torch.Tensor,
        cache=None,
        attn_params=None,
        past_len=None,
        intermediates: bool = False,
        loras=None,
        output_fp32=False,
        **kwargs
    ) -> torch.Tensor | dict[str, torch.Tensor]:
        ...

Import

from exllamav2.layernorm import ExLlamaV2LayerNorm

I/O Contract

Inputs

Name	Type	Required	Description
model	ExLlamaV2	Yes	The parent model instance
key	str	Yes	The tensor key path for loading weights from safetensors
archparams	object	No	Architecture parameters; defaults to model.config.arch.lm
hidden_states	torch.Tensor	Yes (forward)	Input tensor to normalize, shape (batch, seq_len, hidden_size)
intermediates	bool	No (default False)	If True, return a dict with "hidden_states" key instead of bare tensor
output_fp32	bool	No (default False)	Reserved for future use; not yet implemented

Outputs

Name	Type	Description
hidden_states	torch.Tensor	Normalized tensor of the same shape as input
intermediates dict	dict[str, torch.Tensor]	When intermediates=True: dictionary containing "hidden_states" key
weight footprint	int	From weight_footprint(): hidden_size * 2 bytes

Usage Examples

Module in a Transformer Layer (Internal Usage)

from exllamav2.layernorm import ExLlamaV2LayerNorm

# Typically created internally during model construction
layer_norm = ExLlamaV2LayerNorm(
    model=model,
    key="model.layers.0.input_layernorm",
)
layer_norm.load()

# Forward pass with CUDA kernel
normalized = layer_norm.forward(hidden_states)

PyTorch Fallback for Debugging

# Use the pure-PyTorch path (standard nn.LayerNorm)
normalized = layer_norm.forward_torch(hidden_states, intermediates=True)
print(normalized["hidden_states"].shape)

Inspecting Weights

from exllamav2.layernorm import ExLlamaV2LayerNorm

layer_norm = ExLlamaV2LayerNorm(model, "model.norm")
layer_norm.load()

weight = layer_norm.get_weight()
if isinstance(weight, tuple):
    w, b = weight
    print(f"Weight shape: {w.shape}, Bias shape: {b.shape}")
else:
    print(f"Weight shape: {weight.shape}, No bias")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment