Principle:Deepseek ai Janus ODE Denoising
| Knowledge Sources | |
|---|---|
| Domains | Image_Generation, Diffusion_Models |
| Last Updated | 2026-02-10 09:30 GMT |
Overview
An iterative denoising procedure that solves an ODE to transport latent noise into a clean image representation, using an LLM as the velocity predictor with ShallowUViT encoder/decoder for latent-to-LLM bridging.
Description
The ODE denoising loop is the core generation mechanism in JanusFlow. Unlike autoregressive methods that generate tokens sequentially, rectified flow generates images by iteratively refining a noisy latent through Euler ODE steps. At each step:
- Encode latent: ShallowUViTEncoder processes the current noisy latent with a timestep embedding
- Align to LLM: Linear aligner projects UViT output (768-dim) to LLM dimension (2048-dim)
- LLM forward: The language model processes the concatenated text + timestep + latent embeddings
- Align from LLM: RMSNorm + linear aligner projects LLM output (2048-dim) back to UViT dimension (768-dim)
- Decode velocity: ShallowUViTDecoder predicts the velocity field from the projected hidden states
- CFG: Conditional and unconditional velocities are combined
- Euler step: The latent is updated: z = z + dt × v
KV-caching is used to avoid recomputing prompt tokens after the first step.
Usage
Use this principle after noise initialization to denoise the latent over num_inference_steps (default 30) iterations.
Theoretical Basis
The rectified flow ODE:
Solved with the Euler method:
Where v_θ is the velocity field predicted by the combined ShallowUViT-LLM pipeline, and c is the text conditioning.
CFG for velocity:
The timestep is normalized: t = step / num_steps × 1000.