Implementation:Zai org CogVideo LapLoss
| Knowledge Sources | |
|---|---|
| Domains | Video_Generation, Loss_Functions, Image_Processing |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
LapLoss implements a Laplacian pyramid loss that measures multi-scale image reconstruction quality by comparing L1 differences at each level of a Laplacian decomposition.
Description
The LapLoss module constructs Laplacian pyramids of both the predicted and target images and sums the L1 loss at each pyramid level. This provides a perceptually-motivated training objective that penalizes errors at multiple frequency bands, ensuring both coarse structure and fine details are accurately reconstructed.
The Laplacian pyramid is built using the following helper functions:
- gauss_kernel: Creates a fixed 5x5 Gaussian kernel from binomial coefficients
[[1,4,6,4,1], [4,16,24,16,4], ...]normalized to sum to 1.0, repeated across the specified number of channels. - conv_gauss: Applies the Gaussian kernel to an image using reflect padding and grouped convolution (one filter per channel).
- downsample: Subsamples the image by taking every other pixel in both spatial dimensions.
- upsample: Expands the image by interleaving zeros between pixels and convolving with 4x the Gaussian kernel to fill in values.
- laplacian_pyramid: Iteratively applies Gaussian smoothing, downsampling, upsampling, and subtraction to extract detail bands at each scale level.
The default configuration uses 5 pyramid levels and operates on 3-channel RGB images.
Usage
Use LapLoss as a training loss function for frame interpolation or image reconstruction tasks where multi-scale perceptual quality is important. It is the primary reconstruction loss used by the RIFE Model's update() method.
Code Reference
Source Location
- Repository: Zai_org_CogVideo
- File: inference/gradio_composite_demo/rife/laplacian.py
Signature
def gauss_kernel(size=5, channels=3) -> torch.Tensor
def downsample(x: torch.Tensor) -> torch.Tensor
def upsample(x: torch.Tensor) -> torch.Tensor
def conv_gauss(img: torch.Tensor, kernel: torch.Tensor) -> torch.Tensor
def laplacian_pyramid(img: torch.Tensor, kernel: torch.Tensor, max_levels=3) -> list[torch.Tensor]
class LapLoss(torch.nn.Module):
def __init__(self, max_levels=5, channels=3)
def forward(self, input: torch.Tensor, target: torch.Tensor) -> torch.Tensor
Import
from inference.gradio_composite_demo.rife.laplacian import LapLoss, laplacian_pyramid, gauss_kernel
I/O Contract
Inputs
LapLoss.forward:
| Name | Type | Required | Description |
|---|---|---|---|
| input | torch.Tensor | Yes | Predicted image tensor of shape (B, C, H, W) |
| target | torch.Tensor | Yes | Ground truth image tensor of shape (B, C, H, W) |
LapLoss.__init__:
| Name | Type | Required | Description |
|---|---|---|---|
| max_levels | int | No | Number of pyramid levels, default 5 |
| channels | int | No | Number of image channels, default 3 |
Outputs
| Name | Type | Description |
|---|---|---|
| loss | torch.Tensor | Scalar tensor representing the sum of L1 losses across all Laplacian pyramid levels |
Usage Examples
import torch
from inference.gradio_composite_demo.rife.laplacian import LapLoss
# Initialize with default 5 levels for RGB images
lap_loss = LapLoss(max_levels=5, channels=3)
# Compute loss between predicted and target frames
predicted = torch.randn(4, 3, 256, 256).cuda()
target = torch.randn(4, 3, 256, 256).cuda()
loss = lap_loss(predicted, target)
loss.backward() # Differentiable for training