Implementation:Ggml org Llama cpp Mtmd Bitmap Init
| Aspect | Detail |
|---|---|
| Implementation Name | Mtmd Bitmap Init |
| Doc Type | API Doc |
| Domain | Multimodal Inference |
| Purpose | Constructing bitmap objects from image files, raw data buffers, and audio files |
| Related Workflow | Multimodal_Inference |
Overview
Description
This implementation documents the family of functions for creating mtmd_bitmap objects, which are the standardized internal representation of multimodal inputs in llama.cpp. Three primary construction paths are provided:
mtmd_helper_bitmap_init_from_file(): Load from a file path (auto-detects image vs. audio)mtmd_bitmap_init(): Construct directly from raw RGB pixel datamtmd_bitmap_init_from_audio(): Construct directly from raw PCM float32 audio samples
Additionally, mtmd_helper_bitmap_init_from_buf() provides a buffer-based initialization that auto-detects format from magic bytes.
Usage
These functions are called after the multimodal context has been initialized and before tokenization. The resulting mtmd_bitmap * pointers are collected into an array and passed to mtmd_tokenize(). Each bitmap must be freed when no longer needed using mtmd_bitmap_free(), or managed via the C++ RAII wrapper mtmd::bitmap_ptr.
Code Reference
| Aspect | Detail |
|---|---|
| Header (core) | tools/mtmd/mtmd.h:140-141
|
| Header (helpers) | tools/mtmd/mtmd-helper.h:32-41
|
| Source (helpers) | tools/mtmd/mtmd-helper.cpp:470-520
|
| Import | #include "mtmd.h" and #include "mtmd-helper.h"
|
Core bitmap constructors (from mtmd.h):
// Image bitmap: data must be nx * ny * 3 bytes in RGBRGBRGB format
MTMD_API mtmd_bitmap * mtmd_bitmap_init(uint32_t nx, uint32_t ny, const unsigned char * data);
// Audio bitmap: data must be n_samples floats in PCM F32 format
MTMD_API mtmd_bitmap * mtmd_bitmap_init_from_audio(size_t n_samples, const float * data);
Helper constructors (from mtmd-helper.h):
// Load from file path, auto-detects image vs. audio
// Returns nullptr on failure. Thread-safe.
MTMD_API mtmd_bitmap * mtmd_helper_bitmap_init_from_file(mtmd_context * ctx, const char * fname);
// Load from memory buffer, auto-detects format via magic bytes
// Supported: image formats (stb_image), audio formats (WAV, MP3, FLAC via miniaudio)
// Returns nullptr on failure. Thread-safe.
MTMD_API mtmd_bitmap * mtmd_helper_bitmap_init_from_buf(mtmd_context * ctx,
const unsigned char * buf, size_t len);
File-based initialization source (mtmd-helper.cpp:500-521):
mtmd_bitmap * mtmd_helper_bitmap_init_from_file(mtmd_context * ctx, const char * fname) {
std::vector<unsigned char> buf;
FILE * f = fopen(fname, "rb");
if (!f) {
LOG_ERR("Unable to open file %s: %s\n", fname, strerror(errno));
return nullptr;
}
fseek(f, 0, SEEK_END);
long file_size = ftell(f);
fseek(f, 0, SEEK_SET);
buf.resize(file_size);
size_t n_read = fread(buf.data(), 1, file_size, f);
fclose(f);
if (n_read != (size_t)file_size) {
LOG_ERR("Failed to read entire file %s", fname);
return nullptr;
}
return mtmd_helper_bitmap_init_from_buf(ctx, buf.data(), buf.size());
}
Buffer-based initialization source (mtmd-helper.cpp:470-498):
mtmd_bitmap * mtmd_helper_bitmap_init_from_buf(mtmd_context * ctx,
const unsigned char * buf, size_t len) {
if (audio_helpers::is_audio_file((const char *)buf, len)) {
std::vector<float> pcmf32;
int bitrate = mtmd_get_audio_bitrate(ctx);
if (bitrate < 0) {
LOG_ERR("This model does not support audio input\n");
return nullptr;
}
if (!audio_helpers::decode_audio_from_buf(buf, len, bitrate, pcmf32)) {
LOG_ERR("Unable to read WAV audio file from buffer\n");
return nullptr;
}
return mtmd_bitmap_init_from_audio(pcmf32.size(), pcmf32.data());
}
// otherwise, assume it's an image
int nx, ny, nc;
auto * data = stbi_load_from_memory(buf, len, &nx, &ny, &nc, 3);
if (!data) {
LOG_ERR("%s: failed to decode image bytes\n", __func__);
return nullptr;
}
mtmd_bitmap * result = mtmd_bitmap_init(nx, ny, data);
stbi_image_free(data);
return result;
}
I/O Contract
mtmd_helper_bitmap_init_from_file():
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | ctx | mtmd_context * |
Multimodal context (needed for audio bitrate query) |
| Input | fname | const char * |
Path to image or audio file |
| Output | (return) | mtmd_bitmap * |
Bitmap object, or nullptr on failure
|
mtmd_bitmap_init():
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | nx | uint32_t |
Image width in pixels |
| Input | ny | uint32_t |
Image height in pixels |
| Input | data | const unsigned char * |
Raw RGB pixel data (length must be nx * ny * 3)
|
| Output | (return) | mtmd_bitmap * |
Bitmap object |
mtmd_bitmap_init_from_audio():
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | n_samples | size_t |
Number of audio samples |
| Input | data | const float * |
Raw PCM float32 audio data |
| Output | (return) | mtmd_bitmap * |
Bitmap object |
Usage Examples
Example 1: Load image from file
#include "mtmd-helper.h"
// Load an image file (JPEG, PNG, BMP, GIF, etc.)
mtmd_bitmap * bmp = mtmd_helper_bitmap_init_from_file(mtmd_ctx, "photo.jpg");
if (bmp == nullptr) {
fprintf(stderr, "Failed to load image\n");
return 1;
}
// Optionally set an ID for KV cache tracking
mtmd_bitmap_set_id(bmp, "user_photo_1");
// Use the bitmap in tokenization...
// mtmd_tokenize(mtmd_ctx, chunks, &text, &bmp, 1);
// Clean up when done
mtmd_bitmap_free(bmp);
Example 2: Construct image from raw RGB data
#include "mtmd.h"
// Assume we have raw RGB pixel data from some source
uint32_t width = 640, height = 480;
unsigned char * rgb_data = get_rgb_pixels(); // RGBRGBRGB...
mtmd_bitmap * bmp = mtmd_bitmap_init(width, height, rgb_data);
// Query bitmap properties
printf("Image: %u x %u, %zu bytes\n",
mtmd_bitmap_get_nx(bmp),
mtmd_bitmap_get_ny(bmp),
mtmd_bitmap_get_n_bytes(bmp));
Example 3: Load audio file
#include "mtmd-helper.h"
// Audio files are auto-detected by magic bytes (WAV, MP3, FLAC)
mtmd_bitmap * audio_bmp = mtmd_helper_bitmap_init_from_file(mtmd_ctx, "recording.wav");
if (audio_bmp == nullptr) {
fprintf(stderr, "Failed to load audio (model may not support audio)\n");
return 1;
}
// Verify it is indeed audio
bool is_audio = mtmd_bitmap_is_audio(audio_bmp);
printf("Is audio: %s\n", is_audio ? "yes" : "no");
mtmd_bitmap_free(audio_bmp);
Example 4: Using C++ RAII wrapper
#include "mtmd.h"
// Use the mtmd::bitmap wrapper for automatic memory management
mtmd::bitmap bmp(mtmd_helper_bitmap_init_from_file(mtmd_ctx, "image.png"));
if (!bmp.ptr) {
fprintf(stderr, "Failed to load bitmap\n");
return 1;
}
printf("Loaded %u x %u image (%zu bytes)\n", bmp.nx(), bmp.ny(), bmp.n_bytes());
// Automatically freed when bmp goes out of scope