Implementation:Ggml org Llama cpp Mtmd Helper Header
| Knowledge Sources | |
|---|---|
| Domains | Multimodal, Utilities |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Public C API header for mtmd helper functions that simplify multimodal usage by providing high-level convenience wrappers.
Description
This header declares helper functions for the multimodal module: `mtmd_helper_bitmap_init_from_file` and `mtmd_helper_bitmap_init_from_buf` for loading images (JPG, PNG, BMP, GIF via stb_image) and audio (WAV, MP3, FLAC via miniaudio) from files or buffers; `mtmd_helper_get_n_tokens` and `mtmd_helper_get_n_pos` for counting tokens and positions across chunks; `mtmd_helper_eval_chunks` for automatically running decode on mixed text/image/audio chunks; `mtmd_helper_eval_chunk_single` for single-chunk evaluation; and `mtmd_helper_decode_image_chunk` for decoding pre-encoded image embeddings with proper batching. It also provides `mtmd_helper_log_set` for logging configuration.
Usage
Use this header when integrating multimodal capabilities into applications. It provides the high-level convenience functions most applications need without requiring management of low-level encoding/decoding details. Note that these helpers are not guaranteed to be stable and breaking changes are expected.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: tools/mtmd/mtmd-helper.h
- Lines: 1-96
Signature
MTMD_API void mtmd_helper_log_set(ggml_log_callback log_callback, void * user_data);
MTMD_API mtmd_bitmap * mtmd_helper_bitmap_init_from_file(mtmd_context * ctx, const char * fname);
MTMD_API mtmd_bitmap * mtmd_helper_bitmap_init_from_buf(mtmd_context * ctx, const unsigned char * buf, size_t len);
MTMD_API size_t mtmd_helper_get_n_tokens(const mtmd_input_chunks * chunks);
MTMD_API llama_pos mtmd_helper_get_n_pos(const mtmd_input_chunks * chunks);
MTMD_API int32_t mtmd_helper_eval_chunks(
mtmd_context * ctx, struct llama_context * lctx,
const mtmd_input_chunks * chunks,
llama_pos n_past, llama_seq_id seq_id,
int32_t n_batch, bool logits_last,
llama_pos * new_n_past);
MTMD_API int32_t mtmd_helper_eval_chunk_single(
mtmd_context * ctx, struct llama_context * lctx,
const mtmd_input_chunk * chunk,
llama_pos n_past, llama_seq_id seq_id,
int32_t n_batch, bool logits_last,
llama_pos * new_n_past);
MTMD_API int32_t mtmd_helper_decode_image_chunk(
mtmd_context * ctx, struct llama_context * lctx,
const mtmd_input_chunk * chunk, float * encoded_embd,
llama_pos n_past, llama_seq_id seq_id,
int32_t n_batch, llama_pos * new_n_past);
Import
#include "mtmd-helper.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| ctx | mtmd_context * | Yes | Multimodal context for encoding and processing |
| lctx | struct llama_context * | Yes | LLaMA context for decoding |
| fname | const char * | Yes | File path for bitmap loading (image or audio file) |
| buf | const unsigned char * | Yes | Buffer containing file data for bitmap loading |
| len | size_t | Yes | Length of the buffer |
| chunks | const mtmd_input_chunks * | Yes | List of mixed text/image/audio chunks to evaluate |
| n_past | llama_pos | Yes | Current position in the KV cache |
| seq_id | llama_seq_id | Yes | Sequence ID for the evaluation |
| n_batch | int32_t | Yes | Batch size for decoding |
| logits_last | bool | Yes | Whether to compute logits only for the last token |
Outputs
| Name | Type | Description |
|---|---|---|
| mtmd_helper_bitmap_init_from_file | mtmd_bitmap * | Loaded bitmap, or nullptr on failure (thread-safe) |
| mtmd_helper_get_n_tokens | size_t | Total number of tokens across all chunks |
| mtmd_helper_get_n_pos | llama_pos | Total position count (differs from n_tokens for M-RoPE) |
| mtmd_helper_eval_chunks | int32_t | 0 on success, non-zero on encoding or decoding failure |
| new_n_past | llama_pos * | Updated position after evaluation (output parameter) |
Usage Examples
// Load an image from file
mtmd_bitmap * bmp = mtmd_helper_bitmap_init_from_file(ctx, "image.png");
// Count tokens for KV cache tracking
size_t n_tokens = mtmd_helper_get_n_tokens(chunks);
// Evaluate all chunks (text + images)
llama_pos new_n_past;
int32_t result = mtmd_helper_eval_chunks(
ctx, lctx, chunks,
n_past, 0, 512, true, &new_n_past);