Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Deepseek ai Janus ODE Denoising

From Leeroopedia


Knowledge Sources
Domains Image_Generation, Diffusion_Models
Last Updated 2026-02-10 09:30 GMT

Overview

An iterative denoising procedure that solves an ODE to transport latent noise into a clean image representation, using an LLM as the velocity predictor with ShallowUViT encoder/decoder for latent-to-LLM bridging.

Description

The ODE denoising loop is the core generation mechanism in JanusFlow. Unlike autoregressive methods that generate tokens sequentially, rectified flow generates images by iteratively refining a noisy latent through Euler ODE steps. At each step:

  1. Encode latent: ShallowUViTEncoder processes the current noisy latent with a timestep embedding
  2. Align to LLM: Linear aligner projects UViT output (768-dim) to LLM dimension (2048-dim)
  3. LLM forward: The language model processes the concatenated text + timestep + latent embeddings
  4. Align from LLM: RMSNorm + linear aligner projects LLM output (2048-dim) back to UViT dimension (768-dim)
  5. Decode velocity: ShallowUViTDecoder predicts the velocity field from the projected hidden states
  6. CFG: Conditional and unconditional velocities are combined
  7. Euler step: The latent is updated: z = z + dt × v

KV-caching is used to avoid recomputing prompt tokens after the first step.

Usage

Use this principle after noise initialization to denoise the latent over num_inference_steps (default 30) iterations.

Theoretical Basis

The rectified flow ODE:

dztdt=vθ(zt,t,c)

Solved with the Euler method:

zt+dt=zt+dtvθ(zt,t,c)

Where v_θ is the velocity field predicted by the combined ShallowUViT-LLM pipeline, and c is the text conditioning.

CFG for velocity:

v=wvcond(w1)vuncond

The timestep is normalized: t = step / num_steps × 1000.

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment