Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp Mtmd Helper

From Leeroopedia
Knowledge Sources
Domains Multimodal, Utilities
Last Updated 2026-02-15 00:00 GMT

Overview

Public helper library providing convenience functions for loading media files and evaluating multimodal input chunks through the llama.cpp inference pipeline.

Description

Embeds `stb_image.h` (for image loading) and `miniaudio.h` (for audio decoding) as single-header implementations. Provides `mtmd_helper_bitmap_init_from_file` and `mtmd_helper_bitmap_init_from_buf` to auto-detect and load images (via stb) or audio (via miniaudio with resampling to model sample rate). Implements `mtmd_helper_eval_chunks` which iterates over text/image/audio chunks, running `llama_decode` for text tokens and `mtmd_encode` plus embedding decode for media tokens, with proper batching and position tracking. Includes special handling for models requiring non-causal attention during image decoding.

Usage

Use this library when building applications that need to load media files and feed multimodal input to llama.cpp models. It simplifies the complex orchestration of encoding and decoding interleaved text, image, and audio sequences.

Code Reference

Source Location

Signature

// Media file loading
mtmd_bitmap * mtmd_helper_bitmap_init_from_file(const char * path,
    mtmd_context * ctx);
mtmd_bitmap * mtmd_helper_bitmap_init_from_buf(const unsigned char * buf,
    size_t len, mtmd_context * ctx);

// Multimodal chunk evaluation
int32_t mtmd_helper_eval_chunks(mtmd_context * ctx_mtmd,
    llama_context * ctx_llama,
    mtmd_input_chunks * chunks,
    llama_pos pos0,
    llama_seq_id seq_id,
    int32_t n_batch,
    bool logits_last);

Import

#include "mtmd.h"
#include "mtmd-helper.h"
#include "llama.h"
#include "stb/stb_image.h"
#include "miniaudio/miniaudio.h"

I/O Contract

Inputs

Name Type Required Description
path const char* Yes (for file loading) Path to an image or audio file to load
buf / len unsigned char* / size_t Yes (for buffer loading) Raw file data buffer and its length
ctx_mtmd mtmd_context* Yes Initialized multimodal context
ctx_llama llama_context* Yes Initialized llama context for decoding
chunks mtmd_input_chunks* Yes Tokenized multimodal input chunks to evaluate
pos0 llama_pos Yes Starting position in the KV cache
seq_id llama_seq_id Yes Sequence ID for the KV cache
n_batch int32_t Yes Maximum batch size for decoding

Outputs

Name Type Description
mtmd_bitmap* pointer Loaded bitmap/audio data ready for multimodal tokenization
return code int32_t Number of tokens processed, or negative value on error

Usage Examples

#include "mtmd-helper.h"

// Load an image file
mtmd_bitmap * bmp = mtmd_helper_bitmap_init_from_file("photo.jpg", ctx_mtmd);

// Tokenize text with embedded image
mtmd_input_text text = { "Describe this image: <__image__>", true, true };
mtmd_input_chunks * chunks = mtmd_input_chunks_init();
mtmd_tokenize(ctx_mtmd, chunks, &text, &bmp, 1);

// Evaluate all chunks through the model
int32_t n_past = mtmd_helper_eval_chunks(
    ctx_mtmd, ctx_llama, chunks, 0, 0, 512, true);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment