Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp Mtmd Bitmap Init

From Leeroopedia
Revision as of 12:41, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Ggml_org_Llama_cpp_Mtmd_Bitmap_Init.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Aspect Detail
Implementation Name Mtmd Bitmap Init
Doc Type API Doc
Domain Multimodal Inference
Purpose Constructing bitmap objects from image files, raw data buffers, and audio files
Related Workflow Multimodal_Inference

Overview

Description

This implementation documents the family of functions for creating mtmd_bitmap objects, which are the standardized internal representation of multimodal inputs in llama.cpp. Three primary construction paths are provided:

  • mtmd_helper_bitmap_init_from_file(): Load from a file path (auto-detects image vs. audio)
  • mtmd_bitmap_init(): Construct directly from raw RGB pixel data
  • mtmd_bitmap_init_from_audio(): Construct directly from raw PCM float32 audio samples

Additionally, mtmd_helper_bitmap_init_from_buf() provides a buffer-based initialization that auto-detects format from magic bytes.

Usage

These functions are called after the multimodal context has been initialized and before tokenization. The resulting mtmd_bitmap * pointers are collected into an array and passed to mtmd_tokenize(). Each bitmap must be freed when no longer needed using mtmd_bitmap_free(), or managed via the C++ RAII wrapper mtmd::bitmap_ptr.

Code Reference

Aspect Detail
Header (core) tools/mtmd/mtmd.h:140-141
Header (helpers) tools/mtmd/mtmd-helper.h:32-41
Source (helpers) tools/mtmd/mtmd-helper.cpp:470-520
Import #include "mtmd.h" and #include "mtmd-helper.h"

Core bitmap constructors (from mtmd.h):

// Image bitmap: data must be nx * ny * 3 bytes in RGBRGBRGB format
MTMD_API mtmd_bitmap * mtmd_bitmap_init(uint32_t nx, uint32_t ny, const unsigned char * data);

// Audio bitmap: data must be n_samples floats in PCM F32 format
MTMD_API mtmd_bitmap * mtmd_bitmap_init_from_audio(size_t n_samples, const float * data);

Helper constructors (from mtmd-helper.h):

// Load from file path, auto-detects image vs. audio
// Returns nullptr on failure. Thread-safe.
MTMD_API mtmd_bitmap * mtmd_helper_bitmap_init_from_file(mtmd_context * ctx, const char * fname);

// Load from memory buffer, auto-detects format via magic bytes
// Supported: image formats (stb_image), audio formats (WAV, MP3, FLAC via miniaudio)
// Returns nullptr on failure. Thread-safe.
MTMD_API mtmd_bitmap * mtmd_helper_bitmap_init_from_buf(mtmd_context * ctx,
                                                         const unsigned char * buf, size_t len);

File-based initialization source (mtmd-helper.cpp:500-521):

mtmd_bitmap * mtmd_helper_bitmap_init_from_file(mtmd_context * ctx, const char * fname) {
    std::vector<unsigned char> buf;
    FILE * f = fopen(fname, "rb");
    if (!f) {
        LOG_ERR("Unable to open file %s: %s\n", fname, strerror(errno));
        return nullptr;
    }
    fseek(f, 0, SEEK_END);
    long file_size = ftell(f);
    fseek(f, 0, SEEK_SET);
    buf.resize(file_size);
    size_t n_read = fread(buf.data(), 1, file_size, f);
    fclose(f);
    if (n_read != (size_t)file_size) {
        LOG_ERR("Failed to read entire file %s", fname);
        return nullptr;
    }
    return mtmd_helper_bitmap_init_from_buf(ctx, buf.data(), buf.size());
}

Buffer-based initialization source (mtmd-helper.cpp:470-498):

mtmd_bitmap * mtmd_helper_bitmap_init_from_buf(mtmd_context * ctx,
                                                const unsigned char * buf, size_t len) {
    if (audio_helpers::is_audio_file((const char *)buf, len)) {
        std::vector<float> pcmf32;
        int bitrate = mtmd_get_audio_bitrate(ctx);
        if (bitrate < 0) {
            LOG_ERR("This model does not support audio input\n");
            return nullptr;
        }
        if (!audio_helpers::decode_audio_from_buf(buf, len, bitrate, pcmf32)) {
            LOG_ERR("Unable to read WAV audio file from buffer\n");
            return nullptr;
        }
        return mtmd_bitmap_init_from_audio(pcmf32.size(), pcmf32.data());
    }
    // otherwise, assume it's an image
    int nx, ny, nc;
    auto * data = stbi_load_from_memory(buf, len, &nx, &ny, &nc, 3);
    if (!data) {
        LOG_ERR("%s: failed to decode image bytes\n", __func__);
        return nullptr;
    }
    mtmd_bitmap * result = mtmd_bitmap_init(nx, ny, data);
    stbi_image_free(data);
    return result;
}

I/O Contract

mtmd_helper_bitmap_init_from_file():

Direction Name Type Description
Input ctx mtmd_context * Multimodal context (needed for audio bitrate query)
Input fname const char * Path to image or audio file
Output (return) mtmd_bitmap * Bitmap object, or nullptr on failure

mtmd_bitmap_init():

Direction Name Type Description
Input nx uint32_t Image width in pixels
Input ny uint32_t Image height in pixels
Input data const unsigned char * Raw RGB pixel data (length must be nx * ny * 3)
Output (return) mtmd_bitmap * Bitmap object

mtmd_bitmap_init_from_audio():

Direction Name Type Description
Input n_samples size_t Number of audio samples
Input data const float * Raw PCM float32 audio data
Output (return) mtmd_bitmap * Bitmap object

Usage Examples

Example 1: Load image from file

#include "mtmd-helper.h"

// Load an image file (JPEG, PNG, BMP, GIF, etc.)
mtmd_bitmap * bmp = mtmd_helper_bitmap_init_from_file(mtmd_ctx, "photo.jpg");
if (bmp == nullptr) {
    fprintf(stderr, "Failed to load image\n");
    return 1;
}

// Optionally set an ID for KV cache tracking
mtmd_bitmap_set_id(bmp, "user_photo_1");

// Use the bitmap in tokenization...
// mtmd_tokenize(mtmd_ctx, chunks, &text, &bmp, 1);

// Clean up when done
mtmd_bitmap_free(bmp);

Example 2: Construct image from raw RGB data

#include "mtmd.h"

// Assume we have raw RGB pixel data from some source
uint32_t width = 640, height = 480;
unsigned char * rgb_data = get_rgb_pixels(); // RGBRGBRGB...

mtmd_bitmap * bmp = mtmd_bitmap_init(width, height, rgb_data);

// Query bitmap properties
printf("Image: %u x %u, %zu bytes\n",
    mtmd_bitmap_get_nx(bmp),
    mtmd_bitmap_get_ny(bmp),
    mtmd_bitmap_get_n_bytes(bmp));

Example 3: Load audio file

#include "mtmd-helper.h"

// Audio files are auto-detected by magic bytes (WAV, MP3, FLAC)
mtmd_bitmap * audio_bmp = mtmd_helper_bitmap_init_from_file(mtmd_ctx, "recording.wav");
if (audio_bmp == nullptr) {
    fprintf(stderr, "Failed to load audio (model may not support audio)\n");
    return 1;
}

// Verify it is indeed audio
bool is_audio = mtmd_bitmap_is_audio(audio_bmp);
printf("Is audio: %s\n", is_audio ? "yes" : "no");

mtmd_bitmap_free(audio_bmp);

Example 4: Using C++ RAII wrapper

#include "mtmd.h"

// Use the mtmd::bitmap wrapper for automatic memory management
mtmd::bitmap bmp(mtmd_helper_bitmap_init_from_file(mtmd_ctx, "image.png"));
if (!bmp.ptr) {
    fprintf(stderr, "Failed to load bitmap\n");
    return 1;
}
printf("Loaded %u x %u image (%zu bytes)\n", bmp.nx(), bmp.ny(), bmp.n_bytes());
// Automatically freed when bmp goes out of scope

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment