Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Turboderp org Exllamav2 ExLlamaV2LayerNorm

From Leeroopedia
Knowledge Sources
Domains Normalization, Model_Architecture
Last Updated 2026-02-15 00:00 GMT

Overview

Standard layer normalization module that applies full-dimension LayerNorm to hidden states, with a CUDA-accelerated kernel and a pure-PyTorch fallback path.

Description

ExLlamaV2LayerNorm is a subclass of ExLlamaV2Module that implements standard LayerNorm across the full hidden dimension. It is used by transformer architectures that specify LayerNorm (as opposed to RMSNorm) for their normalization layers.

Key components:

  • __init__(model, key, archparams) -- Initialises the module with a default variance_epsilon of 1e-6. The weight and bias are set to None until load() is called.
  • load() -- Loads the normalization weight (and optional bias) from safetensors. Creates an nn.LayerNorm instance with elementwise_affine=True. Updates variance_epsilon from model.config.norm_eps to use the model-specific value.
  • unload() -- Releases the layernorm, weight, and bias tensors.
  • get_weight() -- Returns the weight tensor, or a tuple (weight, bias) if bias is present.
  • weight_footprint() -- Returns hidden_size * 2 bytes as the memory footprint estimate.
  • forward(hidden_states, ...) -- The primary forward path. Reshapes input to 2D, calls ext_c.layer_norm() CUDA kernel with the weight, optional bias, and variance epsilon, then reshapes back to the original shape.
  • forward_torch(hidden_states, ...) -- Pure PyTorch fallback that delegates to self.layernorm() (the standard nn.LayerNorm module).

Both forward paths accept an intermediates flag; when True, they return a dictionary with a hidden_states key.

Usage

Use ExLlamaV2LayerNorm as a normalization layer within transformer blocks that require standard LayerNorm. It is instantiated automatically by the model architecture loader. Users do not typically create this module directly.

Code Reference

Source Location

Signature

class ExLlamaV2LayerNorm(ExLlamaV2Module):

    name: str = "LayerNorm"
    layernorm: nn.LayerNorm | None
    weight: nn.Parameter | None
    bias: nn.Parameter | None
    variance_epsilon: float

    def __init__(
        self,
        model: ExLlamaV2,
        key: str,
        archparams=None
    ):
        ...

    def load(self) -> None:
        ...

    def unload(self) -> None:
        ...

    def get_weight(self) -> torch.Tensor | tuple[torch.Tensor, torch.Tensor]:
        ...

    def weight_footprint(self) -> int:
        ...

    def forward(
        self,
        hidden_states: torch.Tensor,
        cache=None,
        attn_params=None,
        past_len=None,
        intermediates: bool = False,
        loras=None,
        output_fp32=False,
        **kwargs
    ) -> torch.Tensor | dict[str, torch.Tensor]:
        ...

    def forward_torch(
        self,
        hidden_states: torch.Tensor,
        cache=None,
        attn_params=None,
        past_len=None,
        intermediates: bool = False,
        loras=None,
        output_fp32=False,
        **kwargs
    ) -> torch.Tensor | dict[str, torch.Tensor]:
        ...

Import

from exllamav2.layernorm import ExLlamaV2LayerNorm

I/O Contract

Inputs

Name Type Required Description
model ExLlamaV2 Yes The parent model instance
key str Yes The tensor key path for loading weights from safetensors
archparams object No Architecture parameters; defaults to model.config.arch.lm
hidden_states torch.Tensor Yes (forward) Input tensor to normalize, shape (batch, seq_len, hidden_size)
intermediates bool No (default False) If True, return a dict with "hidden_states" key instead of bare tensor
output_fp32 bool No (default False) Reserved for future use; not yet implemented

Outputs

Name Type Description
hidden_states torch.Tensor Normalized tensor of the same shape as input
intermediates dict dict[str, torch.Tensor] When intermediates=True: dictionary containing "hidden_states" key
weight footprint int From weight_footprint(): hidden_size * 2 bytes

Usage Examples

Module in a Transformer Layer (Internal Usage)

from exllamav2.layernorm import ExLlamaV2LayerNorm

# Typically created internally during model construction
layer_norm = ExLlamaV2LayerNorm(
    model=model,
    key="model.layers.0.input_layernorm",
)
layer_norm.load()

# Forward pass with CUDA kernel
normalized = layer_norm.forward(hidden_states)

PyTorch Fallback for Debugging

# Use the pure-PyTorch path (standard nn.LayerNorm)
normalized = layer_norm.forward_torch(hidden_states, intermediates=True)
print(normalized["hidden_states"].shape)

Inspecting Weights

from exllamav2.layernorm import ExLlamaV2LayerNorm

layer_norm = ExLlamaV2LayerNorm(model, "model.norm")
layer_norm.load()

weight = layer_norm.get_weight()
if isinstance(weight, tuple):
    w, b = weight
    print(f"Weight shape: {w.shape}, Bias shape: {b.shape}")
else:
    print(f"Weight shape: {weight.shape}, No bias")

Related Pages

Implements Principle

Requires Environment

Related

Depends On

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment