Implementation:Ggml org Llama cpp Ggml Backend Load All

Knowledge Sources	Domains	Last Updated
ggml-org/llama.cpp	Compute Backend Initialization, Dynamic Library Loading	2026-02-14

Overview

Description

ggml_backend_load_all is the entry-point function that discovers and loads all known compute backend plugins from dynamic libraries. It populates the global backend registry with available hardware accelerators (CUDA, Metal, Vulkan, HIP, SYCL, CANN, BLAS, RPC, etc.) so that subsequent model loading and inference operations can offload work to GPUs and other accelerators.

This function must be called once at application startup, before any model loading or context creation calls.

Usage

#include "ggml-backend.h"

int main(void) {
    // First step: load all available compute backends
    ggml_backend_load_all();

    // Now proceed with model loading...
    return 0;
}

Code Reference

Source Location

File	Line(s)	Type
`ggml/include/ggml-backend.h`	246	Declaration
`ggml/src/ggml-backend-reg.cpp`	536-538	Implementation

Signature

GGML_API void ggml_backend_load_all(void);

Import

#include "ggml-backend.h"

I/O Contract

Inputs

Parameter	Type	Description
(none)	void	This function takes no parameters. It uses a hardcoded list of known backend names to attempt loading.

Outputs

Return	Type	Description
(none)	void	No return value. Backends that are successfully loaded are registered in the global backend registry. Backends that fail to load (e.g., missing shared libraries or unsupported hardware) are silently skipped.

Side Effects

Populates the global backend registry (ggml_backend_reg_t entries) with loaded backends
Registers backend devices (ggml_backend_dev_t) exposed by each loaded backend
Loads dynamic libraries from the system library search path (or the build directory)
The CPU backend is always available as a built-in and does not need to be loaded

Usage Examples

Minimal Text Generation Setup

From examples/simple/simple.cpp:

#include "llama.h"

int main(int argc, char ** argv) {
    // Step 1: Load all dynamic backends (CUDA, Metal, Vulkan, etc.)
    ggml_backend_load_all();

    // Step 2: Configure model parameters with GPU offloading
    llama_model_params model_params = llama_model_default_params();
    model_params.n_gpu_layers = 99;  // offload as many layers as possible

    // Step 3: Load the model (backends must be loaded first)
    llama_model * model = llama_model_load_from_file("model.gguf", model_params);

    // ... rest of inference pipeline ...

    llama_model_free(model);
    return 0;
}

Loading From a Custom Path

If backends are installed in a non-standard location, use the variant that accepts a directory path:

// Load backends from a specific directory
ggml_backend_load_all_from_path("/opt/llama/backends/");

Querying Loaded Backends

After loading, you can enumerate the available backends and devices:

ggml_backend_load_all();

// List all registered backends
size_t n_reg = ggml_backend_reg_count();
for (size_t i = 0; i < n_reg; i++) {
    ggml_backend_reg_t reg = ggml_backend_reg_get(i);
    // Use reg to query backend capabilities...
}

// List all available devices
size_t n_dev = ggml_backend_dev_count();
for (size_t i = 0; i < n_dev; i++) {
    ggml_backend_dev_t dev = ggml_backend_dev_get(i);
    // Use dev to query device properties...
}

Related Pages

Principle:Ggml_org_Llama_cpp_Backend_Loading
Implementation:Ggml_org_Llama_cpp_Llama_Model_Load_From_File -- called after backends are loaded to load a model
Environment:Ggml_org_Llama_cpp_CUDA_GPU_Environment
Environment:Ggml_org_Llama_cpp_Metal_GPU_Environment
Environment:Ggml_org_Llama_cpp_Vulkan_GPU_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment