Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp Mtmd Header

From Leeroopedia
Revision as of 12:41, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Ggml_org_Llama_cpp_Mtmd_Header.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Multimodal, API
Last Updated 2026-02-15 00:00 GMT

Overview

Public C/C++ API header for the libmtmd multimodal library, defining the contract between libmtmd and all consuming code.

Description

Defines the `MTMD_API` export macro and opaque types (`mtmd_context`, `mtmd_bitmap`, `mtmd_image_tokens`, `mtmd_input_chunk`, `mtmd_input_chunks`). Declares the C API for context creation/destruction with `mtmd_context_params`, bitmap management (init/free/get properties), tokenization of interleaved text+media input, vision/audio encoding, output embedding retrieval, and chunk inspection. Provides C++ wrappers with RAII smart pointer types (`mtmd_context_deleter`, `mtmd_bitmap_deleter`, etc.) and a convenience namespace `mtmd` with `bitmap`, `bitmaps`, and `input_chunks` types.

Usage

Include this header in any application, tool, or library that needs to interact with the multimodal subsystem. It is the primary public interface for all multimodal operations in llama.cpp, used by CLI tools, the server, and external applications.

Code Reference

Source Location

Signature

// Opaque types
typedef struct mtmd_context      mtmd_context;
typedef struct mtmd_bitmap       mtmd_bitmap;
typedef struct mtmd_image_tokens mtmd_image_tokens;
typedef struct mtmd_input_chunk  mtmd_input_chunk;
typedef struct mtmd_input_chunks mtmd_input_chunks;

// Input text structure
struct mtmd_input_text {
    const char * text;
    bool add_special;
    bool parse_special;
};

// Chunk types
enum mtmd_input_chunk_type {
    MTMD_INPUT_CHUNK_TYPE_TEXT,
    MTMD_INPUT_CHUNK_TYPE_IMAGE,
    MTMD_INPUT_CHUNK_TYPE_AUDIO,
};

// Context creation / destruction
MTMD_API mtmd_context * mtmd_init_from_file(const char * mmproj_path,
    const struct llama_model * text_model, const struct mtmd_context_params params);
MTMD_API void mtmd_free(mtmd_context * ctx);

// Tokenization and encoding
MTMD_API int32_t mtmd_tokenize(mtmd_context * ctx,
    mtmd_input_chunks * output, const mtmd_input_text * text,
    const mtmd_bitmap ** bitmaps, size_t n_bitmaps);
MTMD_API int32_t mtmd_encode(mtmd_context * ctx,
    const mtmd_input_chunk * chunk);

Import

#include "ggml.h"
#include "llama.h"
// C standard headers
#include <stddef.h>
#include <stdint.h>
#include <stdbool.h>

I/O Contract

Inputs

Name Type Required Description
mmproj_path const char* Yes Path to the multimodal projector GGUF file
text_model llama_model* Yes Pointer to the loaded text model
params mtmd_context_params Yes Context configuration (n_threads, verbosity, image marker, etc.)
text mtmd_input_text Yes Input text with media markers for tokenization
bitmaps mtmd_bitmap** No Array of loaded image/audio bitmaps corresponding to markers

Outputs

Name Type Description
mtmd_context* pointer Initialized multimodal context ready for encoding operations
mtmd_input_chunks* pointer Tokenized input split into text and media chunks
embeddings float* Encoded media embeddings after mtmd_encode
return code int32_t 0 on success, negative on error

Usage Examples

#include "mtmd.h"

// Initialize multimodal context
struct mtmd_context_params params = mtmd_context_default_params();
params.n_threads = 4;
mtmd_context * ctx = mtmd_init_from_file("mmproj.gguf", text_model, params);

// Tokenize interleaved text + image input
mtmd_input_text text = { "What is in this image? <__image__>", true, true };
mtmd_input_chunks * chunks = mtmd_input_chunks_init();
mtmd_tokenize(ctx, chunks, &text, &bmp, 1);

// Encode media chunks
for (size_t i = 0; i < mtmd_input_chunks_size(chunks); i++) {
    const mtmd_input_chunk * chunk = mtmd_input_chunks_get(chunks, i);
    if (mtmd_input_chunk_get_type(chunk) == MTMD_INPUT_CHUNK_TYPE_IMAGE) {
        mtmd_encode(ctx, chunk);
    }
}

mtmd_free(ctx);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment