Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Mtmd Helper

From Leeroopedia
Knowledge Sources
Domains Multimodal, InferenceHelper
Last Updated 2025-02-15 00:00 GMT

Overview

High-level helper library that simplifies multimodal input handling for applications using the mtmd API, including bitmap loading, token counting, and chunk evaluation.

Description

Provides convenience functions wrapping lower-level mtmd and llama APIs. Includes bitmap loading from files and buffers using stb_image for images and miniaudio for audio formats (WAV/MP3/FLAC with automatic detection via magic bytes). The key mtmd_helper_eval_chunks function processes a sequence of text and image/audio chunks by dispatching text chunks to llama_decode() and media chunks through mtmd_encode() then llama_decode() with proper embedding handling and M-RoPE position tracking. The decode_embd_batch helper struct manages embedding batch construction with support for normal, M-RoPE 2D (images), and M-RoPE 1D (audio) position layouts.

Usage

Used by application code to load media files and evaluate mixed text/media sequences in a single call, abstracting away the complexity of interleaving text and media encoding/decoding.

Code Reference

Source Location

  • Repository: Ollama
  • File: llama/llama.cpp/tools/mtmd/mtmd-helper.cpp
  • Lines: 1-521

Signature

size_t mtmd_helper_get_n_tokens(const mtmd_input_chunks * chunks);
llama_pos mtmd_helper_get_n_pos(const mtmd_input_chunks * chunks);
void mtmd_helper_log_set(ggml_log_callback log_callback, void * user_data);

struct decode_embd_batch {
    llama_batch batch;
    decode_embd_batch(float * embd, int32_t n_tokens, int n_pos_per_embd, int n_mmproj_embd);
    void set_position_normal(llama_pos pos_0, llama_seq_id seq_id);
    void set_position_mrope_2d(llama_pos pos_0, int nx, int ny, llama_seq_id seq_id);
    void set_position_mrope_1d(llama_pos pos_0, llama_seq_id seq_id);
    llama_batch get_view(int offset, int n_tokens);
};

Import

#include "mtmd-helper.h"
#include "mtmd.h"
#include "llama.h"

I/O Contract

Inputs

Name Type Required Description
chunks mtmd_input_chunks * Yes Tokenized input chunks from mtmd_tokenize
lctx llama_context * Yes LLM context for text decoding
pos0 llama_pos Yes Starting position for M-RoPE tracking
seq_id llama_seq_id Yes Sequence ID for batched inference

Outputs

Name Type Description
n_tokens size_t Total number of tokens across all chunks
n_pos llama_pos Total positional extent of all chunks
return code int 0 on success, negative on error

Usage Examples

// Load an image and evaluate
mtmd_bitmap * bmp = mtmd_helper_bitmap_init_from_file("photo.jpg");
mtmd_input_chunks * chunks = mtmd_input_chunks_init();
mtmd_tokenize(ctx, chunks, text_input, &bmp, 1);

// Evaluate all chunks (text + image) in one call
int result = mtmd_helper_eval_chunks(ctx, lctx, chunks, n_past, seq_id, n_batch,
                                     /* logits_last */ true);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment