Implementation:Ggml org Llama cpp Ggml Backend Load All
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| ggml-org/llama.cpp | Compute Backend Initialization, Dynamic Library Loading | 2026-02-14 |
Overview
Description
ggml_backend_load_all is the entry-point function that discovers and loads all known compute backend plugins from dynamic libraries. It populates the global backend registry with available hardware accelerators (CUDA, Metal, Vulkan, HIP, SYCL, CANN, BLAS, RPC, etc.) so that subsequent model loading and inference operations can offload work to GPUs and other accelerators.
This function must be called once at application startup, before any model loading or context creation calls.
Usage
#include "ggml-backend.h"
int main(void) {
// First step: load all available compute backends
ggml_backend_load_all();
// Now proceed with model loading...
return 0;
}
Code Reference
Source Location
| File | Line(s) | Type |
|---|---|---|
ggml/include/ggml-backend.h |
246 | Declaration |
ggml/src/ggml-backend-reg.cpp |
536-538 | Implementation |
Signature
GGML_API void ggml_backend_load_all(void);
Import
#include "ggml-backend.h"
I/O Contract
Inputs
| Parameter | Type | Description |
|---|---|---|
| (none) | void | This function takes no parameters. It uses a hardcoded list of known backend names to attempt loading. |
Outputs
| Return | Type | Description |
|---|---|---|
| (none) | void | No return value. Backends that are successfully loaded are registered in the global backend registry. Backends that fail to load (e.g., missing shared libraries or unsupported hardware) are silently skipped. |
Side Effects
- Populates the global backend registry (
ggml_backend_reg_tentries) with loaded backends - Registers backend devices (
ggml_backend_dev_t) exposed by each loaded backend - Loads dynamic libraries from the system library search path (or the build directory)
- The CPU backend is always available as a built-in and does not need to be loaded
Usage Examples
Minimal Text Generation Setup
From examples/simple/simple.cpp:
#include "llama.h"
int main(int argc, char ** argv) {
// Step 1: Load all dynamic backends (CUDA, Metal, Vulkan, etc.)
ggml_backend_load_all();
// Step 2: Configure model parameters with GPU offloading
llama_model_params model_params = llama_model_default_params();
model_params.n_gpu_layers = 99; // offload as many layers as possible
// Step 3: Load the model (backends must be loaded first)
llama_model * model = llama_model_load_from_file("model.gguf", model_params);
// ... rest of inference pipeline ...
llama_model_free(model);
return 0;
}
Loading From a Custom Path
If backends are installed in a non-standard location, use the variant that accepts a directory path:
// Load backends from a specific directory
ggml_backend_load_all_from_path("/opt/llama/backends/");
Querying Loaded Backends
After loading, you can enumerate the available backends and devices:
ggml_backend_load_all();
// List all registered backends
size_t n_reg = ggml_backend_reg_count();
for (size_t i = 0; i < n_reg; i++) {
ggml_backend_reg_t reg = ggml_backend_reg_get(i);
// Use reg to query backend capabilities...
}
// List all available devices
size_t n_dev = ggml_backend_dev_count();
for (size_t i = 0; i < n_dev; i++) {
ggml_backend_dev_t dev = ggml_backend_dev_get(i);
// Use dev to query device properties...
}
Related Pages
- Principle:Ggml_org_Llama_cpp_Backend_Loading
- Implementation:Ggml_org_Llama_cpp_Llama_Model_Load_From_File -- called after backends are loaded to load a model
- Environment:Ggml_org_Llama_cpp_CUDA_GPU_Environment
- Environment:Ggml_org_Llama_cpp_Metal_GPU_Environment
- Environment:Ggml_org_Llama_cpp_Vulkan_GPU_Environment