Implementation:Ggml org Llama cpp Mtmd Helper Header

Knowledge Sources	Ggml_org_Llama_cpp
Domains	Multimodal, Utilities
Last Updated	2026-02-15 00:00 GMT

Overview

Public C API header for mtmd helper functions that simplify multimodal usage by providing high-level convenience wrappers.

Description

This header declares helper functions for the multimodal module: `mtmd_helper_bitmap_init_from_file` and `mtmd_helper_bitmap_init_from_buf` for loading images (JPG, PNG, BMP, GIF via stb_image) and audio (WAV, MP3, FLAC via miniaudio) from files or buffers; `mtmd_helper_get_n_tokens` and `mtmd_helper_get_n_pos` for counting tokens and positions across chunks; `mtmd_helper_eval_chunks` for automatically running decode on mixed text/image/audio chunks; `mtmd_helper_eval_chunk_single` for single-chunk evaluation; and `mtmd_helper_decode_image_chunk` for decoding pre-encoded image embeddings with proper batching. It also provides `mtmd_helper_log_set` for logging configuration.

Usage

Use this header when integrating multimodal capabilities into applications. It provides the high-level convenience functions most applications need without requiring management of low-level encoding/decoding details. Note that these helpers are not guaranteed to be stable and breaking changes are expected.

Code Reference

Source Location

Repository: Ggml_org_Llama_cpp
File: tools/mtmd/mtmd-helper.h
Lines: 1-96

Signature

MTMD_API void mtmd_helper_log_set(ggml_log_callback log_callback, void * user_data);

MTMD_API mtmd_bitmap * mtmd_helper_bitmap_init_from_file(mtmd_context * ctx, const char * fname);
MTMD_API mtmd_bitmap * mtmd_helper_bitmap_init_from_buf(mtmd_context * ctx, const unsigned char * buf, size_t len);

MTMD_API size_t mtmd_helper_get_n_tokens(const mtmd_input_chunks * chunks);
MTMD_API llama_pos mtmd_helper_get_n_pos(const mtmd_input_chunks * chunks);

MTMD_API int32_t mtmd_helper_eval_chunks(
    mtmd_context * ctx, struct llama_context * lctx,
    const mtmd_input_chunks * chunks,
    llama_pos n_past, llama_seq_id seq_id,
    int32_t n_batch, bool logits_last,
    llama_pos * new_n_past);

MTMD_API int32_t mtmd_helper_eval_chunk_single(
    mtmd_context * ctx, struct llama_context * lctx,
    const mtmd_input_chunk * chunk,
    llama_pos n_past, llama_seq_id seq_id,
    int32_t n_batch, bool logits_last,
    llama_pos * new_n_past);

MTMD_API int32_t mtmd_helper_decode_image_chunk(
    mtmd_context * ctx, struct llama_context * lctx,
    const mtmd_input_chunk * chunk, float * encoded_embd,
    llama_pos n_past, llama_seq_id seq_id,
    int32_t n_batch, llama_pos * new_n_past);

Import

#include "mtmd-helper.h"

I/O Contract

Inputs

Name	Type	Required	Description
ctx	mtmd_context *	Yes	Multimodal context for encoding and processing
lctx	struct llama_context *	Yes	LLaMA context for decoding
fname	const char *	Yes	File path for bitmap loading (image or audio file)
buf	const unsigned char *	Yes	Buffer containing file data for bitmap loading
len	size_t	Yes	Length of the buffer
chunks	const mtmd_input_chunks *	Yes	List of mixed text/image/audio chunks to evaluate
n_past	llama_pos	Yes	Current position in the KV cache
seq_id	llama_seq_id	Yes	Sequence ID for the evaluation
n_batch	int32_t	Yes	Batch size for decoding
logits_last	bool	Yes	Whether to compute logits only for the last token

Outputs

Name	Type	Description
mtmd_helper_bitmap_init_from_file	mtmd_bitmap *	Loaded bitmap, or nullptr on failure (thread-safe)
mtmd_helper_get_n_tokens	size_t	Total number of tokens across all chunks
mtmd_helper_get_n_pos	llama_pos	Total position count (differs from n_tokens for M-RoPE)
mtmd_helper_eval_chunks	int32_t	0 on success, non-zero on encoding or decoding failure
new_n_past	llama_pos *	Updated position after evaluation (output parameter)

Usage Examples

// Load an image from file
mtmd_bitmap * bmp = mtmd_helper_bitmap_init_from_file(ctx, "image.png");

// Count tokens for KV cache tracking
size_t n_tokens = mtmd_helper_get_n_tokens(chunks);

// Evaluate all chunks (text + images)
llama_pos new_n_past;
int32_t result = mtmd_helper_eval_chunks(
    ctx, lctx, chunks,
    n_past, 0, 512, true, &new_n_past);

Related Pages

Principle:Ggml_org_Llama_cpp_Multimodal

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment