Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Mtmd API

From Leeroopedia
Revision as of 13:27, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Ollama_Ollama_Mtmd_API.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Multimodal, API
Last Updated 2025-02-15 00:00 GMT

Overview

Public header for libmtmd, the multimodal support library for llama.cpp, declaring the C API and C++ convenience wrappers for vision and audio processing.

Description

Defines the complete public interface for multimodal functionality. Declares opaque types (mtmd_context, mtmd_bitmap, mtmd_image_tokens, mtmd_input_chunk, mtmd_input_chunks), the mtmd_input_text struct, mtmd_context_params configuration, and the mtmd_input_chunk_type enum (text/image/audio). The C API covers context lifecycle (mtmd_init_from_file, mtmd_free), bitmap management (create, set ID, get data), tokenization (mtmd_tokenize), encoding (mtmd_encode), output retrieval (mtmd_get_output_embd), and model capability queries (vision/audio support, M-RoPE usage, audio bitrate). The C++ section provides RAII smart pointer wrappers and a mtmd namespace with bitmap, bitmaps, and input_chunks helper types.

Usage

Included by any code that needs to process images or audio alongside text, including mtmd-helper.cpp, mtmd-cli.cpp, and Ollama's Go CGo bridge.

Code Reference

Source Location

  • Repository: Ollama
  • File: llama/llama.cpp/tools/mtmd/mtmd.h
  • Lines: 1-315

Signature

enum mtmd_input_chunk_type {
    MTMD_INPUT_CHUNK_TYPE_TEXT,
    MTMD_INPUT_CHUNK_TYPE_IMAGE,
    MTMD_INPUT_CHUNK_TYPE_AUDIO,
};

struct mtmd_context_params {
    bool use_gpu;
    bool print_timings;
    int n_threads;
    const char * media_marker;
    enum llama_flash_attn_type flash_attn_type;
    bool warmup;
    int image_min_tokens;
    int image_max_tokens;
};

MTMD_API mtmd_context * mtmd_init_from_file(const char * mmproj_fname,
    const struct llama_model * text_model,
    const struct mtmd_context_params ctx_params);
MTMD_API void mtmd_free(mtmd_context * ctx);

MTMD_API int32_t mtmd_tokenize(mtmd_context * ctx, mtmd_input_chunks * chunks,
    const mtmd_input_text * text, const mtmd_bitmap ** bitmaps, size_t n_bitmaps);
MTMD_API int32_t mtmd_encode(mtmd_context * ctx, const mtmd_input_chunk * chunk);
MTMD_API float * mtmd_get_output_embd(mtmd_context * ctx);

MTMD_API bool mtmd_support_vision(mtmd_context * ctx);
MTMD_API bool mtmd_support_audio(mtmd_context * ctx);
MTMD_API bool mtmd_decode_use_mrope(mtmd_context * ctx);

Import

#include "mtmd.h"

I/O Contract

Inputs

Name Type Required Description
mmproj_fname const char * Yes Path to multimodal projector GGUF file
text_model llama_model * Yes Loaded LLM for tokenization
ctx_params mtmd_context_params Yes Configuration for GPU, threads, markers
text mtmd_input_text * Yes Text prompt with media markers
bitmaps mtmd_bitmap ** Yes Array of image/audio bitmaps

Outputs

Name Type Description
mtmd_context * pointer Initialized multimodal context
chunks mtmd_input_chunks * Tokenized text/media chunk sequence
embd float * Encoded media embedding vector

Usage Examples

// C API usage
struct mtmd_context_params params = mtmd_context_params_default();
mtmd_context * ctx = mtmd_init_from_file("mmproj.gguf", model, params);

mtmd_bitmap * bmp = mtmd_bitmap_init(224, 224, rgb_data);
mtmd_bitmap_set_id(bmp, "img_001");

mtmd_input_text * text = mtmd_input_text_init("Describe: <__media__>", true, true);
mtmd_input_chunks * chunks = mtmd_input_chunks_init();
mtmd_tokenize(ctx, chunks, text, (const mtmd_bitmap **)&bmp, 1);

for (size_t i = 0; i < mtmd_input_chunks_size(chunks); i++) {
    const mtmd_input_chunk * chunk = mtmd_input_chunks_get(chunks, i);
    if (mtmd_input_chunk_get_type(chunk) == MTMD_INPUT_CHUNK_TYPE_IMAGE) {
        mtmd_encode(ctx, chunk);
    }
}

mtmd_input_chunks_free(chunks);
mtmd_bitmap_free(bmp);
mtmd_free(ctx);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment