Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Ggml org Llama cpp Ggml Backend Load All

From Leeroopedia
Knowledge Sources Domains Last Updated
ggml-org/llama.cpp Compute Backend Initialization, Dynamic Library Loading 2026-02-14

Overview

Description

ggml_backend_load_all is the entry-point function that discovers and loads all known compute backend plugins from dynamic libraries. It populates the global backend registry with available hardware accelerators (CUDA, Metal, Vulkan, HIP, SYCL, CANN, BLAS, RPC, etc.) so that subsequent model loading and inference operations can offload work to GPUs and other accelerators.

This function must be called once at application startup, before any model loading or context creation calls.

Usage

#include "ggml-backend.h"

int main(void) {
    // First step: load all available compute backends
    ggml_backend_load_all();

    // Now proceed with model loading...
    return 0;
}

Code Reference

Source Location

File Line(s) Type
ggml/include/ggml-backend.h 246 Declaration
ggml/src/ggml-backend-reg.cpp 536-538 Implementation

Signature

GGML_API void ggml_backend_load_all(void);

Import

#include "ggml-backend.h"

I/O Contract

Inputs

Parameter Type Description
(none) void This function takes no parameters. It uses a hardcoded list of known backend names to attempt loading.

Outputs

Return Type Description
(none) void No return value. Backends that are successfully loaded are registered in the global backend registry. Backends that fail to load (e.g., missing shared libraries or unsupported hardware) are silently skipped.

Side Effects

  • Populates the global backend registry (ggml_backend_reg_t entries) with loaded backends
  • Registers backend devices (ggml_backend_dev_t) exposed by each loaded backend
  • Loads dynamic libraries from the system library search path (or the build directory)
  • The CPU backend is always available as a built-in and does not need to be loaded

Usage Examples

Minimal Text Generation Setup

From examples/simple/simple.cpp:

#include "llama.h"

int main(int argc, char ** argv) {
    // Step 1: Load all dynamic backends (CUDA, Metal, Vulkan, etc.)
    ggml_backend_load_all();

    // Step 2: Configure model parameters with GPU offloading
    llama_model_params model_params = llama_model_default_params();
    model_params.n_gpu_layers = 99;  // offload as many layers as possible

    // Step 3: Load the model (backends must be loaded first)
    llama_model * model = llama_model_load_from_file("model.gguf", model_params);

    // ... rest of inference pipeline ...

    llama_model_free(model);
    return 0;
}

Loading From a Custom Path

If backends are installed in a non-standard location, use the variant that accepts a directory path:

// Load backends from a specific directory
ggml_backend_load_all_from_path("/opt/llama/backends/");

Querying Loaded Backends

After loading, you can enumerate the available backends and devices:

ggml_backend_load_all();

// List all registered backends
size_t n_reg = ggml_backend_reg_count();
for (size_t i = 0; i < n_reg; i++) {
    ggml_backend_reg_t reg = ggml_backend_reg_get(i);
    // Use reg to query backend capabilities...
}

// List all available devices
size_t n_dev = ggml_backend_dev_count();
for (size_t i = 0; i < n_dev; i++) {
    ggml_backend_dev_t dev = ggml_backend_dev_get(i);
    // Use dev to query device properties...
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment