Implementation:Ggml org Ggml Rpc backend api

Metadata

Field	Value
Page Type	Implementation (API Doc)
Knowledge Sources	GGML
Domains	ML_Infrastructure, Tensor_Computing, Distributed_Computing
Last Updated	2025-05-15 12:00 GMT

Overview

Public C header declaring the RPC (Remote Procedure Call) backend interface for offloading tensor computation to remote servers over a network.

Description

ggml-rpc.h declares the RPC backend's public API (30 lines). It provides:

Protocol version constants: RPC_PROTO_MAJOR_VERSION = 3, RPC_PROTO_MINOR_VERSION = 6, RPC_PROTO_PATCH_VERSION = 0. These ensure client-server compatibility via the HELLO handshake.
Server limit: GGML_RPC_MAX_SERVERS = 16 -- maximum number of remote servers that can be registered simultaneously.
Client functions:
- ggml_backend_rpc_init -- connects to a remote server at a given endpoint and device index
- ggml_backend_is_rpc -- type-checks whether a backend is RPC-based
- ggml_backend_rpc_buffer_type -- returns the buffer type for remote memory allocation
- ggml_backend_rpc_get_device_memory -- queries free and total memory on a remote device
Server function:
- ggml_backend_rpc_start_server -- starts an RPC server exposing local backend devices over the network, with optional data caching
Registration:
- ggml_backend_rpc_reg -- returns the backend registration handle
- ggml_backend_rpc_add_server -- dynamically registers a new remote server endpoint

Usage

Include this header to use the RPC backend for distributed inference. The client connects to a remote server endpoint, and all GGML operations are transparently forwarded over the network.

Code Reference

Source Location

GGML repo, file: include/ggml-rpc.h (30 lines).

Signatures

#define RPC_PROTO_MAJOR_VERSION    3
#define RPC_PROTO_MINOR_VERSION    6
#define RPC_PROTO_PATCH_VERSION    0
#define GGML_RPC_MAX_SERVERS       16

GGML_BACKEND_API ggml_backend_t ggml_backend_rpc_init(const char * endpoint, uint32_t device);
GGML_BACKEND_API bool ggml_backend_is_rpc(ggml_backend_t backend);
GGML_BACKEND_API ggml_backend_buffer_type_t ggml_backend_rpc_buffer_type(const char * endpoint, uint32_t device);
GGML_BACKEND_API void ggml_backend_rpc_get_device_memory(const char * endpoint, uint32_t device, size_t * free, size_t * total);
GGML_BACKEND_API void ggml_backend_rpc_start_server(const char * endpoint, const char * cache_dir,
                                                    size_t n_threads, size_t n_devices, ggml_backend_dev_t * devices);
GGML_BACKEND_API ggml_backend_reg_t ggml_backend_rpc_reg(void);
GGML_BACKEND_API ggml_backend_reg_t ggml_backend_rpc_add_server(const char * endpoint);

Import

#include "ggml-rpc.h"

I/O Contract

Inputs

Parameter	Type	Required	Description
`endpoint`	`const char *`	Yes	Network address in `host:port` format for the RPC server.
`device`	`uint32_t`	Yes	Device index on the remote server (0-based).
`cache_dir`	`const char *`	No	Server-side cache directory for tensor data deduplication.
`n_threads`	`size_t`	Yes	Number of server worker threads.
`n_devices`	`size_t`	Yes	Number of local backend devices to expose.
`devices`	`ggml_backend_dev_t *`	Yes	Array of local backend devices to serve remotely.

Outputs

Output	Type	Description
Backend handle	`ggml_backend_t`	RPC client backend proxying all operations to the remote server.
Type check	`bool`	`true` if the backend is RPC-based.
Buffer type	`ggml_backend_buffer_type_t`	Buffer type for remote memory allocation.
Device memory	via output params	Free and total memory on the remote device.
Registration	`ggml_backend_reg_t`	Registration handle for the backend system.

Usage Examples

#include "ggml-rpc.h"

// Connect to a remote server
ggml_backend_t backend = ggml_backend_rpc_init("192.168.1.100:50052", 0);

// Query remote memory
size_t free_mem, total_mem;
ggml_backend_rpc_get_device_memory("192.168.1.100:50052", 0, &free_mem, &total_mem);

// Dynamically register another server
ggml_backend_reg_t reg = ggml_backend_rpc_add_server("192.168.1.101:50052");

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment