Implementation:Zai org CogVideo LapLoss

Knowledge Sources	Zai_org_CogVideo
Domains	Video_Generation, Loss_Functions, Image_Processing
Last Updated	2026-02-10 00:00 GMT

Overview

LapLoss implements a Laplacian pyramid loss that measures multi-scale image reconstruction quality by comparing L1 differences at each level of a Laplacian decomposition.

Description

The LapLoss module constructs Laplacian pyramids of both the predicted and target images and sums the L1 loss at each pyramid level. This provides a perceptually-motivated training objective that penalizes errors at multiple frequency bands, ensuring both coarse structure and fine details are accurately reconstructed.

The Laplacian pyramid is built using the following helper functions:

gauss_kernel: Creates a fixed 5x5 Gaussian kernel from binomial coefficients [[1,4,6,4,1], [4,16,24,16,4], ...] normalized to sum to 1.0, repeated across the specified number of channels.
conv_gauss: Applies the Gaussian kernel to an image using reflect padding and grouped convolution (one filter per channel).
downsample: Subsamples the image by taking every other pixel in both spatial dimensions.
upsample: Expands the image by interleaving zeros between pixels and convolving with 4x the Gaussian kernel to fill in values.
laplacian_pyramid: Iteratively applies Gaussian smoothing, downsampling, upsampling, and subtraction to extract detail bands at each scale level.

The default configuration uses 5 pyramid levels and operates on 3-channel RGB images.

Usage

Use LapLoss as a training loss function for frame interpolation or image reconstruction tasks where multi-scale perceptual quality is important. It is the primary reconstruction loss used by the RIFE Model's update() method.

Code Reference

Source Location

Repository: Zai_org_CogVideo
File: inference/gradio_composite_demo/rife/laplacian.py

Signature

def gauss_kernel(size=5, channels=3) -> torch.Tensor

def downsample(x: torch.Tensor) -> torch.Tensor

def upsample(x: torch.Tensor) -> torch.Tensor

def conv_gauss(img: torch.Tensor, kernel: torch.Tensor) -> torch.Tensor

def laplacian_pyramid(img: torch.Tensor, kernel: torch.Tensor, max_levels=3) -> list[torch.Tensor]

class LapLoss(torch.nn.Module):
    def __init__(self, max_levels=5, channels=3)
    def forward(self, input: torch.Tensor, target: torch.Tensor) -> torch.Tensor

Import

from inference.gradio_composite_demo.rife.laplacian import LapLoss, laplacian_pyramid, gauss_kernel

I/O Contract

Inputs

LapLoss.forward:

Name	Type	Required	Description
input	torch.Tensor	Yes	Predicted image tensor of shape (B, C, H, W)
target	torch.Tensor	Yes	Ground truth image tensor of shape (B, C, H, W)

LapLoss.__init__:

Name	Type	Required	Description
max_levels	int	No	Number of pyramid levels, default 5
channels	int	No	Number of image channels, default 3

Outputs

Name	Type	Description
loss	torch.Tensor	Scalar tensor representing the sum of L1 losses across all Laplacian pyramid levels

Usage Examples

import torch
from inference.gradio_composite_demo.rife.laplacian import LapLoss

# Initialize with default 5 levels for RGB images
lap_loss = LapLoss(max_levels=5, channels=3)

# Compute loss between predicted and target frames
predicted = torch.randn(4, 3, 256, 256).cuda()
target = torch.randn(4, 3, 256, 256).cuda()

loss = lap_loss(predicted, target)
loss.backward()  # Differentiable for training

Related Pages

Principle:Zai_org_CogVideo_Laplacian_Pyramid_Loss

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment