Implementation:Ollama Ollama Llama Core
| Knowledge Sources | |
|---|---|
| Domains | LLM Inference, Public API |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
The main public API implementation file that provides the C function implementations declared in llama.h, connecting the public API surface to the internal implementation classes.
Description
Implements public API functions by delegating to internal classes: llama_model_load_from_file creates llama_model and calls load_hparams/load_tensors, llama_init_from_model creates llama_context. Provides parameter defaults, GPU memory estimation via llama_get_device_memory_data, the sampler chain API, KV cache operations, tokenize/detokenize wrappers, model metadata access, batch operations, state save/load, and performance counters. Also implements llama_backend_init/llama_backend_free for ggml backend lifecycle management.
Usage
This is the glue layer between the public C API and the internal C++ implementation. It is the entry point for all external consumers of llama.cpp, including Ollama's Go bindings via CGo.
Code Reference
Source Location
- Repository: Ollama
- File:
llama/llama.cpp/src/llama.cpp - Lines: 1-1070
Signature
const char * llama_flash_attn_type_name(enum llama_flash_attn_type flash_attn_type);
struct llama_device_memory_data {
int64_t total;
int64_t free;
llama_memory_breakdown_data mb;
};
static std::vector<llama_device_memory_data> llama_get_device_memory_data(
const char * path_model, const llama_model_params * mparams,
const llama_context_params * cparams,
std::vector<ggml_backend_dev_t> & devs,
uint32_t & hp_ngl, uint32_t & hp_n_ctx_train, uint32_t & hp_n_expert,
const ggml_log_level log_level);
static void llama_params_fit_impl(
const char * path_model, struct llama_model_params * mparams,
struct llama_context_params * cparams,
float * tensor_split, struct llama_model_tensor_buft_override * tensor_buft_overrides,
size_t margin_s, uint32_t n_ctx_min, enum ggml_log_level log_level);
Import
#include "llama.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| path_model | const char * | Yes | Path to the GGUF model file |
| mparams | llama_model_params | Yes | Model loading parameters |
| cparams | llama_context_params | Yes | Context creation parameters |
Outputs
| Name | Type | Description |
|---|---|---|
| llama_model* | pointer | Loaded model instance |
| llama_context* | pointer | Inference context |
Usage Examples
#include "llama.h"
llama_backend_init();
llama_model_params mparams = llama_model_default_params();
llama_model * model = llama_model_load_from_file("model.gguf", mparams);
llama_context_params cparams = llama_context_default_params();
llama_context * ctx = llama_init_from_model(model, cparams);
// Use model/context for inference...
llama_free(ctx);
llama_model_free(model);
llama_backend_free();