Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp Mtmd Bitmap Init

From Leeroopedia
Aspect Detail
Implementation Name Mtmd Bitmap Init
Doc Type API Doc
Domain Multimodal Inference
Purpose Constructing bitmap objects from image files, raw data buffers, and audio files
Related Workflow Multimodal_Inference

Overview

Description

This implementation documents the family of functions for creating mtmd_bitmap objects, which are the standardized internal representation of multimodal inputs in llama.cpp. Three primary construction paths are provided:

  • mtmd_helper_bitmap_init_from_file(): Load from a file path (auto-detects image vs. audio)
  • mtmd_bitmap_init(): Construct directly from raw RGB pixel data
  • mtmd_bitmap_init_from_audio(): Construct directly from raw PCM float32 audio samples

Additionally, mtmd_helper_bitmap_init_from_buf() provides a buffer-based initialization that auto-detects format from magic bytes.

Usage

These functions are called after the multimodal context has been initialized and before tokenization. The resulting mtmd_bitmap * pointers are collected into an array and passed to mtmd_tokenize(). Each bitmap must be freed when no longer needed using mtmd_bitmap_free(), or managed via the C++ RAII wrapper mtmd::bitmap_ptr.

Code Reference

Aspect Detail
Header (core) tools/mtmd/mtmd.h:140-141
Header (helpers) tools/mtmd/mtmd-helper.h:32-41
Source (helpers) tools/mtmd/mtmd-helper.cpp:470-520
Import #include "mtmd.h" and #include "mtmd-helper.h"

Core bitmap constructors (from mtmd.h):

// Image bitmap: data must be nx * ny * 3 bytes in RGBRGBRGB format
MTMD_API mtmd_bitmap * mtmd_bitmap_init(uint32_t nx, uint32_t ny, const unsigned char * data);

// Audio bitmap: data must be n_samples floats in PCM F32 format
MTMD_API mtmd_bitmap * mtmd_bitmap_init_from_audio(size_t n_samples, const float * data);

Helper constructors (from mtmd-helper.h):

// Load from file path, auto-detects image vs. audio
// Returns nullptr on failure. Thread-safe.
MTMD_API mtmd_bitmap * mtmd_helper_bitmap_init_from_file(mtmd_context * ctx, const char * fname);

// Load from memory buffer, auto-detects format via magic bytes
// Supported: image formats (stb_image), audio formats (WAV, MP3, FLAC via miniaudio)
// Returns nullptr on failure. Thread-safe.
MTMD_API mtmd_bitmap * mtmd_helper_bitmap_init_from_buf(mtmd_context * ctx,
                                                         const unsigned char * buf, size_t len);

File-based initialization source (mtmd-helper.cpp:500-521):

mtmd_bitmap * mtmd_helper_bitmap_init_from_file(mtmd_context * ctx, const char * fname) {
    std::vector<unsigned char> buf;
    FILE * f = fopen(fname, "rb");
    if (!f) {
        LOG_ERR("Unable to open file %s: %s\n", fname, strerror(errno));
        return nullptr;
    }
    fseek(f, 0, SEEK_END);
    long file_size = ftell(f);
    fseek(f, 0, SEEK_SET);
    buf.resize(file_size);
    size_t n_read = fread(buf.data(), 1, file_size, f);
    fclose(f);
    if (n_read != (size_t)file_size) {
        LOG_ERR("Failed to read entire file %s", fname);
        return nullptr;
    }
    return mtmd_helper_bitmap_init_from_buf(ctx, buf.data(), buf.size());
}

Buffer-based initialization source (mtmd-helper.cpp:470-498):

mtmd_bitmap * mtmd_helper_bitmap_init_from_buf(mtmd_context * ctx,
                                                const unsigned char * buf, size_t len) {
    if (audio_helpers::is_audio_file((const char *)buf, len)) {
        std::vector<float> pcmf32;
        int bitrate = mtmd_get_audio_bitrate(ctx);
        if (bitrate < 0) {
            LOG_ERR("This model does not support audio input\n");
            return nullptr;
        }
        if (!audio_helpers::decode_audio_from_buf(buf, len, bitrate, pcmf32)) {
            LOG_ERR("Unable to read WAV audio file from buffer\n");
            return nullptr;
        }
        return mtmd_bitmap_init_from_audio(pcmf32.size(), pcmf32.data());
    }
    // otherwise, assume it's an image
    int nx, ny, nc;
    auto * data = stbi_load_from_memory(buf, len, &nx, &ny, &nc, 3);
    if (!data) {
        LOG_ERR("%s: failed to decode image bytes\n", __func__);
        return nullptr;
    }
    mtmd_bitmap * result = mtmd_bitmap_init(nx, ny, data);
    stbi_image_free(data);
    return result;
}

I/O Contract

mtmd_helper_bitmap_init_from_file():

Direction Name Type Description
Input ctx mtmd_context * Multimodal context (needed for audio bitrate query)
Input fname const char * Path to image or audio file
Output (return) mtmd_bitmap * Bitmap object, or nullptr on failure

mtmd_bitmap_init():

Direction Name Type Description
Input nx uint32_t Image width in pixels
Input ny uint32_t Image height in pixels
Input data const unsigned char * Raw RGB pixel data (length must be nx * ny * 3)
Output (return) mtmd_bitmap * Bitmap object

mtmd_bitmap_init_from_audio():

Direction Name Type Description
Input n_samples size_t Number of audio samples
Input data const float * Raw PCM float32 audio data
Output (return) mtmd_bitmap * Bitmap object

Usage Examples

Example 1: Load image from file

#include "mtmd-helper.h"

// Load an image file (JPEG, PNG, BMP, GIF, etc.)
mtmd_bitmap * bmp = mtmd_helper_bitmap_init_from_file(mtmd_ctx, "photo.jpg");
if (bmp == nullptr) {
    fprintf(stderr, "Failed to load image\n");
    return 1;
}

// Optionally set an ID for KV cache tracking
mtmd_bitmap_set_id(bmp, "user_photo_1");

// Use the bitmap in tokenization...
// mtmd_tokenize(mtmd_ctx, chunks, &text, &bmp, 1);

// Clean up when done
mtmd_bitmap_free(bmp);

Example 2: Construct image from raw RGB data

#include "mtmd.h"

// Assume we have raw RGB pixel data from some source
uint32_t width = 640, height = 480;
unsigned char * rgb_data = get_rgb_pixels(); // RGBRGBRGB...

mtmd_bitmap * bmp = mtmd_bitmap_init(width, height, rgb_data);

// Query bitmap properties
printf("Image: %u x %u, %zu bytes\n",
    mtmd_bitmap_get_nx(bmp),
    mtmd_bitmap_get_ny(bmp),
    mtmd_bitmap_get_n_bytes(bmp));

Example 3: Load audio file

#include "mtmd-helper.h"

// Audio files are auto-detected by magic bytes (WAV, MP3, FLAC)
mtmd_bitmap * audio_bmp = mtmd_helper_bitmap_init_from_file(mtmd_ctx, "recording.wav");
if (audio_bmp == nullptr) {
    fprintf(stderr, "Failed to load audio (model may not support audio)\n");
    return 1;
}

// Verify it is indeed audio
bool is_audio = mtmd_bitmap_is_audio(audio_bmp);
printf("Is audio: %s\n", is_audio ? "yes" : "no");

mtmd_bitmap_free(audio_bmp);

Example 4: Using C++ RAII wrapper

#include "mtmd.h"

// Use the mtmd::bitmap wrapper for automatic memory management
mtmd::bitmap bmp(mtmd_helper_bitmap_init_from_file(mtmd_ctx, "image.png"));
if (!bmp.ptr) {
    fprintf(stderr, "Failed to load bitmap\n");
    return 1;
}
printf("Loaded %u x %u image (%zu bytes)\n", bmp.nx(), bmp.ny(), bmp.n_bytes());
// Automatically freed when bmp goes out of scope

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment