Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Llama Core

From Leeroopedia
Knowledge Sources
Domains LLM Inference, Public API
Last Updated 2025-02-15 00:00 GMT

Overview

The main public API implementation file that provides the C function implementations declared in llama.h, connecting the public API surface to the internal implementation classes.

Description

Implements public API functions by delegating to internal classes: llama_model_load_from_file creates llama_model and calls load_hparams/load_tensors, llama_init_from_model creates llama_context. Provides parameter defaults, GPU memory estimation via llama_get_device_memory_data, the sampler chain API, KV cache operations, tokenize/detokenize wrappers, model metadata access, batch operations, state save/load, and performance counters. Also implements llama_backend_init/llama_backend_free for ggml backend lifecycle management.

Usage

This is the glue layer between the public C API and the internal C++ implementation. It is the entry point for all external consumers of llama.cpp, including Ollama's Go bindings via CGo.

Code Reference

Source Location

  • Repository: Ollama
  • File: llama/llama.cpp/src/llama.cpp
  • Lines: 1-1070

Signature

const char * llama_flash_attn_type_name(enum llama_flash_attn_type flash_attn_type);

struct llama_device_memory_data {
    int64_t total;
    int64_t free;
    llama_memory_breakdown_data mb;
};

static std::vector<llama_device_memory_data> llama_get_device_memory_data(
    const char * path_model, const llama_model_params * mparams,
    const llama_context_params * cparams,
    std::vector<ggml_backend_dev_t> & devs,
    uint32_t & hp_ngl, uint32_t & hp_n_ctx_train, uint32_t & hp_n_expert,
    const ggml_log_level log_level);

static void llama_params_fit_impl(
    const char * path_model, struct llama_model_params * mparams,
    struct llama_context_params * cparams,
    float * tensor_split, struct llama_model_tensor_buft_override * tensor_buft_overrides,
    size_t margin_s, uint32_t n_ctx_min, enum ggml_log_level log_level);

Import

#include "llama.h"

I/O Contract

Inputs

Name Type Required Description
path_model const char * Yes Path to the GGUF model file
mparams llama_model_params Yes Model loading parameters
cparams llama_context_params Yes Context creation parameters

Outputs

Name Type Description
llama_model* pointer Loaded model instance
llama_context* pointer Inference context

Usage Examples

#include "llama.h"

llama_backend_init();

llama_model_params mparams = llama_model_default_params();
llama_model * model = llama_model_load_from_file("model.gguf", mparams);

llama_context_params cparams = llama_context_default_params();
llama_context * ctx = llama_init_from_model(model, cparams);

// Use model/context for inference...

llama_free(ctx);
llama_model_free(model);
llama_backend_free();

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment