Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Zai org CogVideo IFNet HDv3

From Leeroopedia


Knowledge Sources
Domains Video_Generation, Optical_Flow, Frame_Interpolation
Last Updated 2026-02-10 00:00 GMT

Overview

IFNet HDv3 is a lightweight variant of the Intermediate Flow Network that uses symmetric bidirectional flow averaging and omits Contextnet/Unet refinement for faster HD video frame interpolation.

Description

IFNet HDv3 implements the RIFE HD v3 architecture, a streamlined flow estimation network optimized for high-definition video. Unlike the original IFNet, this variant processes flow estimation symmetrically by running each IFBlock twice per scale -- once for each temporal direction -- and averaging the bidirectional results. This symmetric processing enforces temporal consistency without requiring a separate teacher network.

Each IFBlock uses four pairs of residual convolution blocks (instead of eight sequential blocks in the original), followed by two separate transposed-convolution heads: conv1 producing 4-channel optical flow and conv2 producing a 1-channel blending mask. The block takes concatenated warped frames and accumulated flow/mask as input, applies bilinear interpolation for multi-scale processing with explicit recompute_scale_factor=False, and outputs residual flow and mask corrections.

The IFNet forward pass initializes flow and mask as zero tensors, then iterates through three scales. At each scale, the block is called twice with swapped frame ordering and negated mask, and the bidirectional results are averaged:

flow = flow + (f0 + swap(f1)) / 2

The Contextnet and Unet refinement modules are commented out, so the output relies purely on flow-based warping and mask blending.

Usage

Use IFNet HDv3 as the flow estimation backbone for the RIFE HD v3 pipeline. This is the variant actually employed in the Gradio composite demo for video frame interpolation, offering faster inference compared to teacher-distillation variants.

Code Reference

Source Location

  • Repository: Zai_org_CogVideo
  • File: inference/gradio_composite_demo/rife/IFNet_HDv3.py

Signature

class IFBlock(nn.Module):
    def __init__(self, in_planes, c=64)
    def forward(self, x, flow, scale=1) -> Tuple[torch.Tensor, torch.Tensor]

class IFNet(nn.Module):
    def __init__(self)
    def forward(self, x, scale_list=[4, 2, 1], training=False) -> Tuple[list, torch.Tensor, list]

Import

from inference.gradio_composite_demo.rife.IFNet_HDv3 import IFNet, IFBlock

I/O Contract

Inputs

IFNet.forward:

Name Type Required Description
x torch.Tensor Yes Concatenated input frames along the channel dimension. Shape: (B, 2*C, H, W), where C is the number of image channels. The tensor is split in half to obtain img0 and img1
scale_list list[int] No Multi-scale factors for the three IFBlock stages, default [4, 2, 1]
training bool No Training mode flag, default False. When False, automatically splits x into two equal halves

IFBlock.forward:

Name Type Required Description
x torch.Tensor Yes Concatenated warped image features and mask
flow torch.Tensor Yes Accumulated optical flow tensor of shape (B, 4, H, W)
scale int No Current scale factor for bilinear interpolation, default 1

Outputs

IFNet.forward:

Name Type Description
flow_list list[torch.Tensor] Optical flow fields at each scale, each of shape (B, 4, H, W)
mask torch.Tensor Final sigmoid-activated blending mask of shape (B, 1, H, W)
merged list[torch.Tensor] Blended interpolated frames at each scale, each of shape (B, C, H, W)

Usage Examples

import torch
from inference.gradio_composite_demo.rife.IFNet_HDv3 import IFNet

model = IFNet()
model.eval()

# Concatenate two input frames along channel dimension
img0 = torch.randn(1, 3, 720, 1280)  # HD frame
img1 = torch.randn(1, 3, 720, 1280)
x = torch.cat((img0, img1), dim=1)   # (1, 6, 720, 1280)

with torch.no_grad():
    flow_list, mask, merged = model(x, scale_list=[4, 2, 1])
    interpolated_frame = merged[2]  # Final scale result: (1, 3, 720, 1280)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment