Principle:Zai org CogVideo Tiled Image Upscaling
| Knowledge Sources | |
|---|---|
| Domains | Image_Super_Resolution, Video_Generation |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Tiled image upscaling divides high-resolution inputs into overlapping patches, processes each patch independently through a super-resolution model, and reassembles them with feathered blending to produce seamless upscaled output within fixed memory budgets.
Description
Super-resolution models trained on fixed-size patches (typically 64x64 to 512x512 pixels) cannot directly process arbitrarily large images due to GPU memory constraints. Tiled upscaling solves this by breaking the input into manageable tiles with controlled overlap regions, running each tile through the upscaling model independently, and then blending the results back together.
The key challenge is avoiding visible seams at tile boundaries. This is addressed through feathered blending: each tile's contribution is weighted by a mask that linearly ramps from 0 at the edges to 1 in the interior over the overlap region. When overlapping tiles are summed and normalized by the accumulated mask weights, the result is a smooth transition with no visible discontinuities.
The approach generalizes to arbitrary spatial dimensions, making it applicable to both 2D images and 3D volumetric data. The tile size, overlap size, and upscaling factor are all configurable parameters that allow trading off between processing speed (larger tiles), memory usage (smaller tiles), and blending quality (larger overlap).
Usage
Use tiled upscaling whenever applying a super-resolution model to images larger than the model's native training resolution, or when GPU memory is insufficient to process the full image at once. This is standard practice in production super-resolution pipelines and is particularly important for video processing where each frame must be upscaled individually.
Theoretical Basis
For a 2D image of size with tile size and overlap , the tiling grid positions are:
For each tile at position , the feathering mask is constructed as:
where the ramp function along each dimension is:
Here is the overlap scaled by the upscale factor , and is the tile's output size. The final output at each pixel is:
where the sums are over all tiles that cover position . This weighted averaging guarantees continuity across tile boundaries.