Implementation:Ggml org Llama cpp Llama Model Load For Multimodal
| Aspect | Detail |
|---|---|
| Implementation Name | Llama Model Load For Multimodal |
| Doc Type | Pattern Doc |
| Domain | Multimodal Inference |
| Purpose | Loading the text GGUF model as foundation for the multimodal pipeline |
| Related Workflow | Multimodal_Inference |
Overview
Description
This pattern documents loading the text/language GGUF model using llama_model_load_from_file(), which is the standard entry point for model loading in llama.cpp. In the multimodal pipeline, this is the first step: the resulting llama_model * pointer is subsequently passed to mtmd_init_from_file() to establish the multimodal projector context.
Usage
The text model must be loaded before any multimodal context is created. The returned llama_model * is treated as the language backbone throughout the entire multimodal session. It is passed as a const pointer to the multimodal initialization, meaning the mtmd layer reads model metadata (embedding dimensions, vocabulary) but does not modify the model itself.
Code Reference
| Aspect | Detail |
|---|---|
| Source Location | include/llama.h:450-452
|
| Signature | struct llama_model * llama_model_load_from_file(const char * path_model, struct llama_model_params params)
|
| Import | #include "llama.h"
|
The function loads a GGUF model file from disk and returns an opaque model handle. If the model file is split into multiple parts, the filename must follow the pattern <name>-%05d-of-%05d.gguf. For custom split naming, use llama_model_load_from_splits() instead.
I/O Contract
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | path_model | const char * |
File path to the GGUF text model |
| Input | params | struct llama_model_params |
Model loading parameters (GPU layers, mmap, mlock, split mode) |
| Output | (return) | struct llama_model * |
Opaque model handle, or NULL on failure
|
Usage Examples
Example 1: Basic multimodal model loading pattern
#include "llama.h"
// Initialize the backend
llama_backend_init();
// Configure model parameters
struct llama_model_params model_params = llama_model_default_params();
model_params.n_gpu_layers = 35; // offload 35 layers to GPU
// Load the text model
struct llama_model * model = llama_model_load_from_file(
"models/llava-v1.6-vicuna-7b-Q4_K_M.gguf",
model_params
);
if (model == NULL) {
fprintf(stderr, "Failed to load text model\n");
return 1;
}
// The model pointer is now ready to be passed to mtmd_init_from_file()
// along with the mmproj GGUF path
Example 2: Loading with context creation for inference
#include "llama.h"
struct llama_model_params model_params = llama_model_default_params();
model_params.n_gpu_layers = 99; // offload all layers
struct llama_model * model = llama_model_load_from_file("model.gguf", model_params);
// Create a context for inference
struct llama_context_params ctx_params = llama_context_default_params();
ctx_params.n_ctx = 4096;
ctx_params.n_batch = 512;
struct llama_context * ctx = llama_init_from_model(model, ctx_params);
// Both model and ctx are needed for the full multimodal pipeline