Implementation:Ollama Ollama Imagegen VAE Tiling
| Knowledge Sources | |
|---|---|
| Domains | Image Generation, VAE |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Implements tiled VAE decoding with overlap blending to reduce memory usage when generating large images.
Description
The tiling.go file provides DecodeTiled for processing large latent tensors through the VAE decoder in overlapping tiles, matching the diffusers library tiling implementation. The four-phase algorithm: (1) extracts overlapping tiles from the latent tensor and decodes each independently via the provided decoder function, (2) blends adjacent tiles using linear interpolation in both vertical (blendV) and horizontal (blendH) directions, (3) calculates crop dimensions for the non-overlapping region of each tile, and (4) assembles the final image by copying pixel data from the cropped tiles into the output buffer. TilingConfig specifies tile size (64 latent pixels) and overlap (16 latent pixels = 25%). The decoded result is converted from NHWC to NCHW format and clamped to [0, 1].
Usage
Used by Z-Image and FLUX.2 pipelines when generating images larger than the tile size (512x512 pixels) to keep memory within bounds.
Code Reference
Source Location
- Repository: Ollama
- File: x/imagegen/vae/tiling.go
- Lines: 1-215
Signature
type TilingConfig struct {
TileSize int32 // Tile size in latent space (default 64)
Overlap int32 // Overlap in latent space (default 16 = 25%)
}
func DefaultTilingConfig() *TilingConfig
func DecodeTiled(
latents *mlx.Array,
cfg *TilingConfig,
decoder func(*mlx.Array) *mlx.Array,
) *mlx.Array
Import
import "github.com/ollama/ollama/x/imagegen/vae"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| latents | *mlx.Array | Yes | Latent tensor [1, H, W, C] in NHWC format |
| cfg | *TilingConfig | Yes | Tile size and overlap configuration |
| decoder | func(*mlx.Array) *mlx.Array | Yes | Single-tile decoder function |
Outputs
| Name | Type | Description |
|---|---|---|
| *mlx.Array | *mlx.Array | Decoded image [1, 3, H*8, W*8] in NCHW format, clamped to [0, 1] |
Usage Examples
cfg := vae.DefaultTilingConfig() // 64 tile, 16 overlap
image := vae.DecodeTiled(latents, cfg, func(tile *mlx.Array) *mlx.Array {
return myVAE.DecodeTile(tile)
})