Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp RPC Server

From Leeroopedia
Knowledge Sources
Domains Distributed, Networking
Last Updated 2026-02-15 00:00 GMT

Overview

RPC server that exposes local ggml compute devices (GPUs, CPUs) over TCP for distributed inference across networked machines.

Description

Parses command-line arguments for host, port, memory limit, and device selection. Detects available ggml backend devices and optionally filters by user-specified device names or indices. Creates a cache directory for RPC data, then starts the `ggml_rpc_server` on the configured endpoint. Includes cross-platform utilities for directory creation and UTF-8 path handling on Windows. Deliberately avoids linking against `libcommon`, duplicating some utility functions locally to minimize dependencies.

Usage

Use this server to enable distributed LLM inference by allowing remote machines to contribute their compute resources (CUDA, Metal, CPU) to a central llama.cpp instance over the network. Run one RPC server per machine with available hardware, then connect from the client using the `--rpc` flag.

Code Reference

Source Location

Signature

// Main entry point
int main(int argc, char ** argv);

// Server parameters
struct rpc_server_params {
    std::string host = "0.0.0.0";
    int port = 50052;
    size_t backend_mem = 0;
    std::vector<std::string> devices;
};

Import

#include "ggml-rpc.h"
#include <string>
#include <vector>
#include <thread>
#include <regex>

I/O Contract

Inputs

Name Type Required Description
-H, --host string No Host address to bind to (default: 0.0.0.0)
-p, --port int No TCP port to listen on (default: 50052)
-m, --mem size_t No Maximum backend memory to expose (0 = unlimited)
-d, --dev string No Device name or index to expose (can be repeated)

Outputs

Name Type Description
TCP server network Listening TCP server accepting RPC compute requests from llama.cpp clients
return code int 0 on clean shutdown, non-zero on error

Usage Examples

# Start RPC server on default port, exposing all devices
./rpc-server

# Start on specific host and port, limit to CUDA device
./rpc-server -H 192.168.1.100 -p 50052 -d CUDA0

# Client-side usage: connect to remote RPC server
./llama-cli -m model.gguf --rpc 192.168.1.100:50052 -ngl 99

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment