Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp Mtmd Helper Header

From Leeroopedia
Knowledge Sources
Domains Multimodal, Utilities
Last Updated 2026-02-15 00:00 GMT

Overview

Public C API header for mtmd helper functions that simplify multimodal usage by providing high-level convenience wrappers.

Description

This header declares helper functions for the multimodal module: `mtmd_helper_bitmap_init_from_file` and `mtmd_helper_bitmap_init_from_buf` for loading images (JPG, PNG, BMP, GIF via stb_image) and audio (WAV, MP3, FLAC via miniaudio) from files or buffers; `mtmd_helper_get_n_tokens` and `mtmd_helper_get_n_pos` for counting tokens and positions across chunks; `mtmd_helper_eval_chunks` for automatically running decode on mixed text/image/audio chunks; `mtmd_helper_eval_chunk_single` for single-chunk evaluation; and `mtmd_helper_decode_image_chunk` for decoding pre-encoded image embeddings with proper batching. It also provides `mtmd_helper_log_set` for logging configuration.

Usage

Use this header when integrating multimodal capabilities into applications. It provides the high-level convenience functions most applications need without requiring management of low-level encoding/decoding details. Note that these helpers are not guaranteed to be stable and breaking changes are expected.

Code Reference

Source Location

Signature

MTMD_API void mtmd_helper_log_set(ggml_log_callback log_callback, void * user_data);

MTMD_API mtmd_bitmap * mtmd_helper_bitmap_init_from_file(mtmd_context * ctx, const char * fname);
MTMD_API mtmd_bitmap * mtmd_helper_bitmap_init_from_buf(mtmd_context * ctx, const unsigned char * buf, size_t len);

MTMD_API size_t mtmd_helper_get_n_tokens(const mtmd_input_chunks * chunks);
MTMD_API llama_pos mtmd_helper_get_n_pos(const mtmd_input_chunks * chunks);

MTMD_API int32_t mtmd_helper_eval_chunks(
    mtmd_context * ctx, struct llama_context * lctx,
    const mtmd_input_chunks * chunks,
    llama_pos n_past, llama_seq_id seq_id,
    int32_t n_batch, bool logits_last,
    llama_pos * new_n_past);

MTMD_API int32_t mtmd_helper_eval_chunk_single(
    mtmd_context * ctx, struct llama_context * lctx,
    const mtmd_input_chunk * chunk,
    llama_pos n_past, llama_seq_id seq_id,
    int32_t n_batch, bool logits_last,
    llama_pos * new_n_past);

MTMD_API int32_t mtmd_helper_decode_image_chunk(
    mtmd_context * ctx, struct llama_context * lctx,
    const mtmd_input_chunk * chunk, float * encoded_embd,
    llama_pos n_past, llama_seq_id seq_id,
    int32_t n_batch, llama_pos * new_n_past);

Import

#include "mtmd-helper.h"

I/O Contract

Inputs

Name Type Required Description
ctx mtmd_context * Yes Multimodal context for encoding and processing
lctx struct llama_context * Yes LLaMA context for decoding
fname const char * Yes File path for bitmap loading (image or audio file)
buf const unsigned char * Yes Buffer containing file data for bitmap loading
len size_t Yes Length of the buffer
chunks const mtmd_input_chunks * Yes List of mixed text/image/audio chunks to evaluate
n_past llama_pos Yes Current position in the KV cache
seq_id llama_seq_id Yes Sequence ID for the evaluation
n_batch int32_t Yes Batch size for decoding
logits_last bool Yes Whether to compute logits only for the last token

Outputs

Name Type Description
mtmd_helper_bitmap_init_from_file mtmd_bitmap * Loaded bitmap, or nullptr on failure (thread-safe)
mtmd_helper_get_n_tokens size_t Total number of tokens across all chunks
mtmd_helper_get_n_pos llama_pos Total position count (differs from n_tokens for M-RoPE)
mtmd_helper_eval_chunks int32_t 0 on success, non-zero on encoding or decoding failure
new_n_past llama_pos * Updated position after evaluation (output parameter)

Usage Examples

// Load an image from file
mtmd_bitmap * bmp = mtmd_helper_bitmap_init_from_file(ctx, "image.png");

// Count tokens for KV cache tracking
size_t n_tokens = mtmd_helper_get_n_tokens(chunks);

// Evaluate all chunks (text + images)
llama_pos new_n_past;
int32_t result = mtmd_helper_eval_chunks(
    ctx, lctx, chunks,
    n_past, 0, 512, true, &new_n_past);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment