Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Llama Model

From Leeroopedia
Knowledge Sources
Domains LLM Inference, Model Loading
Last Updated 2025-02-15 00:00 GMT

Overview

Implements the llama_model class, including model loading from GGUF files, hyperparameter initialization, tensor creation, buffer allocation, and memory system initialization for all supported architectures.

Description

The load_hparams function reads hyperparameters from GGUF metadata for each architecture. load_tensors creates the model's ggml tensors, allocates backend buffers (with GPU offloading based on n_gpu_layers), and loads tensor data. init_memory creates the appropriate memory system (KV cache, ISWA cache, recurrent memory, or hybrid memory) based on the model architecture. Contains per-architecture tensor layout definitions and layer device assignment logic for multi-GPU tensor parallelism.

Usage

This is the largest and most critical file in llama.cpp's source. It is the single place where all model architectures are loaded and initialized. Every model that Ollama serves is loaded through this file's logic.

Code Reference

Source Location

  • Repository: Ollama
  • File: llama/llama.cpp/src/llama-model.cpp
  • Lines: 1-8060

Signature

const char * llm_type_name(llm_type type);
static const char * llama_expert_gating_func_name(llama_expert_gating_func_type type);

// Key model operations (defined in llama-model.h, implemented here):
// llama_model::load_hparams(llama_model_loader & ml)
// llama_model::load_tensors(llama_model_loader & ml)
// llama_model::init_memory(const llama_context_params & cparams)
// llama_model::build_graph(const llm_graph_params & params)

Import

#include "llama-model.h"

I/O Contract

Inputs

Name Type Required Description
ml llama_model_loader & Yes Model loader with GGUF data
cparams llama_context_params Yes Context parameters for memory init
params llm_graph_params Yes Graph build parameters

Outputs

Name Type Description
hparams llama_hparams Loaded model hyperparameters
vocab llama_vocab Loaded vocabulary
layers std::vector<llama_layer> Per-layer tensor structure
memory llama_memory_ptr Initialized memory system

Usage Examples

// Model loading is triggered by the public API:
llama_model * model = llama_model_load_from_file("model.gguf", params);

// Internally this calls:
// model->load_hparams(ml);
// model->load_tensors(ml);
// After context creation: model->init_memory(cparams);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment