Implementation:Turboderp org Exllamav2 ExLlamaV2LayerNorm
| Knowledge Sources | |
|---|---|
| Domains | Normalization, Model_Architecture |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Standard layer normalization module that applies full-dimension LayerNorm to hidden states, with a CUDA-accelerated kernel and a pure-PyTorch fallback path.
Description
ExLlamaV2LayerNorm is a subclass of ExLlamaV2Module that implements standard LayerNorm across the full hidden dimension. It is used by transformer architectures that specify LayerNorm (as opposed to RMSNorm) for their normalization layers.
Key components:
- __init__(model, key, archparams) -- Initialises the module with a default variance_epsilon of 1e-6. The weight and bias are set to None until load() is called.
- load() -- Loads the normalization weight (and optional bias) from safetensors. Creates an nn.LayerNorm instance with elementwise_affine=True. Updates variance_epsilon from model.config.norm_eps to use the model-specific value.
- unload() -- Releases the layernorm, weight, and bias tensors.
- get_weight() -- Returns the weight tensor, or a tuple (weight, bias) if bias is present.
- weight_footprint() -- Returns hidden_size * 2 bytes as the memory footprint estimate.
- forward(hidden_states, ...) -- The primary forward path. Reshapes input to 2D, calls ext_c.layer_norm() CUDA kernel with the weight, optional bias, and variance epsilon, then reshapes back to the original shape.
- forward_torch(hidden_states, ...) -- Pure PyTorch fallback that delegates to self.layernorm() (the standard nn.LayerNorm module).
Both forward paths accept an intermediates flag; when True, they return a dictionary with a hidden_states key.
Usage
Use ExLlamaV2LayerNorm as a normalization layer within transformer blocks that require standard LayerNorm. It is instantiated automatically by the model architecture loader. Users do not typically create this module directly.
Code Reference
Source Location
- Repository: Turboderp_org_Exllamav2
- File: exllamav2/layernorm.py
- Lines: L1-147
Signature
class ExLlamaV2LayerNorm(ExLlamaV2Module):
name: str = "LayerNorm"
layernorm: nn.LayerNorm | None
weight: nn.Parameter | None
bias: nn.Parameter | None
variance_epsilon: float
def __init__(
self,
model: ExLlamaV2,
key: str,
archparams=None
):
...
def load(self) -> None:
...
def unload(self) -> None:
...
def get_weight(self) -> torch.Tensor | tuple[torch.Tensor, torch.Tensor]:
...
def weight_footprint(self) -> int:
...
def forward(
self,
hidden_states: torch.Tensor,
cache=None,
attn_params=None,
past_len=None,
intermediates: bool = False,
loras=None,
output_fp32=False,
**kwargs
) -> torch.Tensor | dict[str, torch.Tensor]:
...
def forward_torch(
self,
hidden_states: torch.Tensor,
cache=None,
attn_params=None,
past_len=None,
intermediates: bool = False,
loras=None,
output_fp32=False,
**kwargs
) -> torch.Tensor | dict[str, torch.Tensor]:
...
Import
from exllamav2.layernorm import ExLlamaV2LayerNorm
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | ExLlamaV2 | Yes | The parent model instance |
| key | str | Yes | The tensor key path for loading weights from safetensors |
| archparams | object | No | Architecture parameters; defaults to model.config.arch.lm |
| hidden_states | torch.Tensor | Yes (forward) | Input tensor to normalize, shape (batch, seq_len, hidden_size) |
| intermediates | bool | No (default False) | If True, return a dict with "hidden_states" key instead of bare tensor |
| output_fp32 | bool | No (default False) | Reserved for future use; not yet implemented |
Outputs
| Name | Type | Description |
|---|---|---|
| hidden_states | torch.Tensor | Normalized tensor of the same shape as input |
| intermediates dict | dict[str, torch.Tensor] | When intermediates=True: dictionary containing "hidden_states" key |
| weight footprint | int | From weight_footprint(): hidden_size * 2 bytes |
Usage Examples
Module in a Transformer Layer (Internal Usage)
from exllamav2.layernorm import ExLlamaV2LayerNorm
# Typically created internally during model construction
layer_norm = ExLlamaV2LayerNorm(
model=model,
key="model.layers.0.input_layernorm",
)
layer_norm.load()
# Forward pass with CUDA kernel
normalized = layer_norm.forward(hidden_states)
PyTorch Fallback for Debugging
# Use the pure-PyTorch path (standard nn.LayerNorm)
normalized = layer_norm.forward_torch(hidden_states, intermediates=True)
print(normalized["hidden_states"].shape)
Inspecting Weights
from exllamav2.layernorm import ExLlamaV2LayerNorm
layer_norm = ExLlamaV2LayerNorm(model, "model.norm")
layer_norm.load()
weight = layer_norm.get_weight()
if isinstance(weight, tuple):
w, b = weight
print(f"Weight shape: {w.shape}, Bias shape: {b.shape}")
else:
print(f"Weight shape: {weight.shape}, No bias")