Implementation:Zai org CogVideo IFNet HDv3
| Knowledge Sources | |
|---|---|
| Domains | Video_Generation, Optical_Flow, Frame_Interpolation |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
IFNet HDv3 is a lightweight variant of the Intermediate Flow Network that uses symmetric bidirectional flow averaging and omits Contextnet/Unet refinement for faster HD video frame interpolation.
Description
IFNet HDv3 implements the RIFE HD v3 architecture, a streamlined flow estimation network optimized for high-definition video. Unlike the original IFNet, this variant processes flow estimation symmetrically by running each IFBlock twice per scale -- once for each temporal direction -- and averaging the bidirectional results. This symmetric processing enforces temporal consistency without requiring a separate teacher network.
Each IFBlock uses four pairs of residual convolution blocks (instead of eight sequential blocks in the original), followed by two separate transposed-convolution heads: conv1 producing 4-channel optical flow and conv2 producing a 1-channel blending mask. The block takes concatenated warped frames and accumulated flow/mask as input, applies bilinear interpolation for multi-scale processing with explicit recompute_scale_factor=False, and outputs residual flow and mask corrections.
The IFNet forward pass initializes flow and mask as zero tensors, then iterates through three scales. At each scale, the block is called twice with swapped frame ordering and negated mask, and the bidirectional results are averaged:
flow = flow + (f0 + swap(f1)) / 2
The Contextnet and Unet refinement modules are commented out, so the output relies purely on flow-based warping and mask blending.
Usage
Use IFNet HDv3 as the flow estimation backbone for the RIFE HD v3 pipeline. This is the variant actually employed in the Gradio composite demo for video frame interpolation, offering faster inference compared to teacher-distillation variants.
Code Reference
Source Location
- Repository: Zai_org_CogVideo
- File: inference/gradio_composite_demo/rife/IFNet_HDv3.py
Signature
class IFBlock(nn.Module):
def __init__(self, in_planes, c=64)
def forward(self, x, flow, scale=1) -> Tuple[torch.Tensor, torch.Tensor]
class IFNet(nn.Module):
def __init__(self)
def forward(self, x, scale_list=[4, 2, 1], training=False) -> Tuple[list, torch.Tensor, list]
Import
from inference.gradio_composite_demo.rife.IFNet_HDv3 import IFNet, IFBlock
I/O Contract
Inputs
IFNet.forward:
| Name | Type | Required | Description |
|---|---|---|---|
| x | torch.Tensor | Yes | Concatenated input frames along the channel dimension. Shape: (B, 2*C, H, W), where C is the number of image channels. The tensor is split in half to obtain img0 and img1 |
| scale_list | list[int] | No | Multi-scale factors for the three IFBlock stages, default [4, 2, 1] |
| training | bool | No | Training mode flag, default False. When False, automatically splits x into two equal halves |
IFBlock.forward:
| Name | Type | Required | Description |
|---|---|---|---|
| x | torch.Tensor | Yes | Concatenated warped image features and mask |
| flow | torch.Tensor | Yes | Accumulated optical flow tensor of shape (B, 4, H, W) |
| scale | int | No | Current scale factor for bilinear interpolation, default 1 |
Outputs
IFNet.forward:
| Name | Type | Description |
|---|---|---|
| flow_list | list[torch.Tensor] | Optical flow fields at each scale, each of shape (B, 4, H, W) |
| mask | torch.Tensor | Final sigmoid-activated blending mask of shape (B, 1, H, W) |
| merged | list[torch.Tensor] | Blended interpolated frames at each scale, each of shape (B, C, H, W) |
Usage Examples
import torch
from inference.gradio_composite_demo.rife.IFNet_HDv3 import IFNet
model = IFNet()
model.eval()
# Concatenate two input frames along channel dimension
img0 = torch.randn(1, 3, 720, 1280) # HD frame
img1 = torch.randn(1, 3, 720, 1280)
x = torch.cat((img0, img1), dim=1) # (1, 6, 720, 1280)
with torch.no_grad():
flow_list, mask, merged = model(x, scale_list=[4, 2, 1])
interpolated_frame = merged[2] # Final scale result: (1, 3, 720, 1280)