Implementation:Ggml org Ggml Rpc backend api
Appearance
Metadata
| Field | Value |
|---|---|
| Page Type | Implementation (API Doc) |
| Knowledge Sources | GGML |
| Domains | ML_Infrastructure, Tensor_Computing, Distributed_Computing |
| Last Updated | 2025-05-15 12:00 GMT |
Overview
Public C header declaring the RPC (Remote Procedure Call) backend interface for offloading tensor computation to remote servers over a network.
Description
ggml-rpc.h declares the RPC backend's public API (30 lines). It provides:
- Protocol version constants:
RPC_PROTO_MAJOR_VERSION = 3,RPC_PROTO_MINOR_VERSION = 6,RPC_PROTO_PATCH_VERSION = 0. These ensure client-server compatibility via the HELLO handshake. - Server limit:
GGML_RPC_MAX_SERVERS = 16-- maximum number of remote servers that can be registered simultaneously. - Client functions:
ggml_backend_rpc_init-- connects to a remote server at a given endpoint and device indexggml_backend_is_rpc-- type-checks whether a backend is RPC-basedggml_backend_rpc_buffer_type-- returns the buffer type for remote memory allocationggml_backend_rpc_get_device_memory-- queries free and total memory on a remote device
- Server function:
ggml_backend_rpc_start_server-- starts an RPC server exposing local backend devices over the network, with optional data caching
- Registration:
ggml_backend_rpc_reg-- returns the backend registration handleggml_backend_rpc_add_server-- dynamically registers a new remote server endpoint
Usage
Include this header to use the RPC backend for distributed inference. The client connects to a remote server endpoint, and all GGML operations are transparently forwarded over the network.
Code Reference
Source Location
GGML repo, file: include/ggml-rpc.h (30 lines).
Signatures
#define RPC_PROTO_MAJOR_VERSION 3
#define RPC_PROTO_MINOR_VERSION 6
#define RPC_PROTO_PATCH_VERSION 0
#define GGML_RPC_MAX_SERVERS 16
GGML_BACKEND_API ggml_backend_t ggml_backend_rpc_init(const char * endpoint, uint32_t device);
GGML_BACKEND_API bool ggml_backend_is_rpc(ggml_backend_t backend);
GGML_BACKEND_API ggml_backend_buffer_type_t ggml_backend_rpc_buffer_type(const char * endpoint, uint32_t device);
GGML_BACKEND_API void ggml_backend_rpc_get_device_memory(const char * endpoint, uint32_t device, size_t * free, size_t * total);
GGML_BACKEND_API void ggml_backend_rpc_start_server(const char * endpoint, const char * cache_dir,
size_t n_threads, size_t n_devices, ggml_backend_dev_t * devices);
GGML_BACKEND_API ggml_backend_reg_t ggml_backend_rpc_reg(void);
GGML_BACKEND_API ggml_backend_reg_t ggml_backend_rpc_add_server(const char * endpoint);
Import
#include "ggml-rpc.h"
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
endpoint |
const char * |
Yes | Network address in host:port format for the RPC server.
|
device |
uint32_t |
Yes | Device index on the remote server (0-based). |
cache_dir |
const char * |
No | Server-side cache directory for tensor data deduplication. |
n_threads |
size_t |
Yes | Number of server worker threads. |
n_devices |
size_t |
Yes | Number of local backend devices to expose. |
devices |
ggml_backend_dev_t * |
Yes | Array of local backend devices to serve remotely. |
Outputs
| Output | Type | Description |
|---|---|---|
| Backend handle | ggml_backend_t |
RPC client backend proxying all operations to the remote server. |
| Type check | bool |
true if the backend is RPC-based.
|
| Buffer type | ggml_backend_buffer_type_t |
Buffer type for remote memory allocation. |
| Device memory | via output params | Free and total memory on the remote device. |
| Registration | ggml_backend_reg_t |
Registration handle for the backend system. |
Usage Examples
#include "ggml-rpc.h"
// Connect to a remote server
ggml_backend_t backend = ggml_backend_rpc_init("192.168.1.100:50052", 0);
// Query remote memory
size_t free_mem, total_mem;
ggml_backend_rpc_get_device_memory("192.168.1.100:50052", 0, &free_mem, &total_mem);
// Dynamically register another server
ggml_backend_reg_t reg = ggml_backend_rpc_add_server("192.168.1.101:50052");
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment