Principle:Zai org CogVideo Tiled Image Upscaling

Knowledge Sources	Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data
Domains	Image_Super_Resolution, Video_Generation
Last Updated	2026-02-10 00:00 GMT

Overview

Tiled image upscaling divides high-resolution inputs into overlapping patches, processes each patch independently through a super-resolution model, and reassembles them with feathered blending to produce seamless upscaled output within fixed memory budgets.

Description

Super-resolution models trained on fixed-size patches (typically 64x64 to 512x512 pixels) cannot directly process arbitrarily large images due to GPU memory constraints. Tiled upscaling solves this by breaking the input into manageable tiles with controlled overlap regions, running each tile through the upscaling model independently, and then blending the results back together.

The key challenge is avoiding visible seams at tile boundaries. This is addressed through feathered blending: each tile's contribution is weighted by a mask that linearly ramps from 0 at the edges to 1 in the interior over the overlap region. When overlapping tiles are summed and normalized by the accumulated mask weights, the result is a smooth transition with no visible discontinuities.

The approach generalizes to arbitrary spatial dimensions, making it applicable to both 2D images and 3D volumetric data. The tile size, overlap size, and upscaling factor are all configurable parameters that allow trading off between processing speed (larger tiles), memory usage (smaller tiles), and blending quality (larger overlap).

Usage

Use tiled upscaling whenever applying a super-resolution model to images larger than the model's native training resolution, or when GPU memory is insufficient to process the full image at once. This is standard practice in production super-resolution pipelines and is particularly important for video processing where each frame must be upscaled individually.

Theoretical Basis

For a 2D image of size $H \times W$ with tile size $T$ and overlap $O$ , the tiling grid positions are:

$p_{i} = i \cdot (T - O), i = 0, 1, 2, \dots, ⌈ H / (T - O) ⌉ - 1$

For each tile at position $(p_{y}, p_{x})$ , the feathering mask $M$ is constructed as:

$M (y, x) = α (y) \cdot α (x)$

where the ramp function along each dimension is:

$α (t) = {\begin{cases} \frac{t + 1}{O^{'}} & if t < O^{'} \\ \frac{S - t}{O^{'}} & if t \geq S - O^{'} \\ 1 & otherwise \end{cases}$

Here $O^{'} = O \cdot s$ is the overlap scaled by the upscale factor $s$ , and $S$ is the tile's output size. The final output at each pixel is:

$I (y, x) = \frac{\sum_{k} M_{k} (y, x) \cdot T_{k} (y, x)}{\sum_{k} M_{k} (y, x)}$

where the sums are over all tiles $k$ that cover position $(y, x)$ . This weighted averaging guarantees continuity across tile boundaries.

Related Pages

Implementation:Zai_org_CogVideo_Upscale_Utils

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment