Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp Diffusion CLI

From Leeroopedia
Knowledge Sources
Domains Diffusion, Text_Generation
Last Updated 2026-02-15 00:00 GMT

Overview

CLI tool for text generation using Diffusion Language Models (DLLMs) with iterative denoising, supporting multiple diffusion algorithms and scheduling methods.

Description

This program implements diffusion-based text generation with five algorithms (CONFIDENCE_BASED, ENTROPY_BASED, MARGIN_BASED, RANDOM, ORIGIN) and two scheduling methods (TIMESTEP_BASED for Dream-style, BLOCK_BASED for LLaDA-style). It starts with masked tokens and iteratively unmasks them over configurable diffusion steps. The diffusion_params struct controls parameters including steps, temperature, top-k/top-p sampling, Gumbel noise injection, and classifier-free guidance. A calculate_confidence function computes per-token confidence scores to determine unmasking order.

Usage

Use this CLI tool to run non-autoregressive diffusion model architectures (Dream, LLaDA, RND1) with llama.cpp, expanding beyond traditional left-to-right token generation. It supports optional live visualization of the generation process.

Code Reference

Source Location

Signature

enum diffusion_algorithm {
    ORIGIN = 0, ENTROPY_BASED = 1, MARGIN_BASED = 2,
    RANDOM = 3, CONFIDENCE_BASED = 4
};

enum transfer_schedule {
    TIMESTEP_BASED = 0,  // Dream-style
    BLOCK_BASED    = 1,  // LLaDA-style
};

typedef bool (*diffusion_step_callback_t)(
    int32_t step, int32_t total_steps,
    const llama_token * tokens, int32_t n_tokens, void * user_data);

struct diffusion_params {
    int32_t steps;
    float temperature;
    llama_token mask_token_id;
    diffusion_algorithm algorithm;
    transfer_schedule schedule;
    float top_p;
    int32_t top_k;
    float cfg_scale;
    bool add_gumbel_noise;
    int32_t max_length;
};

static float calculate_confidence(
    const llama_token_data_array & cur_p,
    diffusion_algorithm algorithm,
    std::mt19937 & rng);

Import

#include "arg.h"
#include "chat.h"
#include "common.h"
#include "llama.h"
#include "log.h"
#include <algorithm>
#include <cmath>
#include <random>
#include <vector>

I/O Contract

Inputs

Name Type Required Description
model GGUF model file Yes Diffusion language model (Dream, LLaDA, or RND1 architecture)
prompt std::string Yes Input text prompt to condition the diffusion process
steps int32_t No Number of diffusion denoising steps (default: model-dependent)
temperature float No Sampling temperature for token selection
algorithm diffusion_algorithm No Unmasking strategy (default: CONFIDENCE_BASED)
schedule transfer_schedule No Scheduling method (default: TIMESTEP_BASED)
top_p / top_k float / int32_t No Nucleus and top-k sampling parameters
visual_mode bool No Enable live visualization of the generation process

Outputs

Name Type Description
generated_text std::string Text generated through iterative diffusion denoising
step_callback callback Optional per-step callback for monitoring generation progress

Usage Examples

// Command-line usage:
// ./diffusion-cli -m dream-model.gguf -p "Once upon a time" \
//     --diffusion-steps 64 --temp 0.7 --diffusion-algo confidence

// With visual mode to see token unmasking in real time:
// ./diffusion-cli -m llada-model.gguf -p "The quick brown fox" \
//     --visual --diffusion-schedule block --block-length 32

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment