Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Zai org CogVideo LapLoss

From Leeroopedia
Revision as of 17:08, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Zai_org_CogVideo_LapLoss.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Video_Generation, Loss_Functions, Image_Processing
Last Updated 2026-02-10 00:00 GMT

Overview

LapLoss implements a Laplacian pyramid loss that measures multi-scale image reconstruction quality by comparing L1 differences at each level of a Laplacian decomposition.

Description

The LapLoss module constructs Laplacian pyramids of both the predicted and target images and sums the L1 loss at each pyramid level. This provides a perceptually-motivated training objective that penalizes errors at multiple frequency bands, ensuring both coarse structure and fine details are accurately reconstructed.

The Laplacian pyramid is built using the following helper functions:

  • gauss_kernel: Creates a fixed 5x5 Gaussian kernel from binomial coefficients [[1,4,6,4,1], [4,16,24,16,4], ...] normalized to sum to 1.0, repeated across the specified number of channels.
  • conv_gauss: Applies the Gaussian kernel to an image using reflect padding and grouped convolution (one filter per channel).
  • downsample: Subsamples the image by taking every other pixel in both spatial dimensions.
  • upsample: Expands the image by interleaving zeros between pixels and convolving with 4x the Gaussian kernel to fill in values.
  • laplacian_pyramid: Iteratively applies Gaussian smoothing, downsampling, upsampling, and subtraction to extract detail bands at each scale level.

The default configuration uses 5 pyramid levels and operates on 3-channel RGB images.

Usage

Use LapLoss as a training loss function for frame interpolation or image reconstruction tasks where multi-scale perceptual quality is important. It is the primary reconstruction loss used by the RIFE Model's update() method.

Code Reference

Source Location

  • Repository: Zai_org_CogVideo
  • File: inference/gradio_composite_demo/rife/laplacian.py

Signature

def gauss_kernel(size=5, channels=3) -> torch.Tensor

def downsample(x: torch.Tensor) -> torch.Tensor

def upsample(x: torch.Tensor) -> torch.Tensor

def conv_gauss(img: torch.Tensor, kernel: torch.Tensor) -> torch.Tensor

def laplacian_pyramid(img: torch.Tensor, kernel: torch.Tensor, max_levels=3) -> list[torch.Tensor]

class LapLoss(torch.nn.Module):
    def __init__(self, max_levels=5, channels=3)
    def forward(self, input: torch.Tensor, target: torch.Tensor) -> torch.Tensor

Import

from inference.gradio_composite_demo.rife.laplacian import LapLoss, laplacian_pyramid, gauss_kernel

I/O Contract

Inputs

LapLoss.forward:

Name Type Required Description
input torch.Tensor Yes Predicted image tensor of shape (B, C, H, W)
target torch.Tensor Yes Ground truth image tensor of shape (B, C, H, W)

LapLoss.__init__:

Name Type Required Description
max_levels int No Number of pyramid levels, default 5
channels int No Number of image channels, default 3

Outputs

Name Type Description
loss torch.Tensor Scalar tensor representing the sum of L1 losses across all Laplacian pyramid levels

Usage Examples

import torch
from inference.gradio_composite_demo.rife.laplacian import LapLoss

# Initialize with default 5 levels for RGB images
lap_loss = LapLoss(max_levels=5, channels=3)

# Compute loss between predicted and target frames
predicted = torch.randn(4, 3, 256, 256).cuda()
target = torch.randn(4, 3, 256, 256).cuda()

loss = lap_loss(predicted, target)
loss.backward()  # Differentiable for training

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment