Implementation:Ggml org Llama cpp Server CLI Args

Field	Value
Implementation Name	Server CLI Args
Doc Type	Wrapper Doc
Domain	CLI Configuration, Argument Parsing
Description	CLI argument parsing for llama-server configuration: host, port, model, parallelism, security, and endpoint toggles
Related Workflow	OpenAI_Compatible_Server

Overview

Description

The Server CLI Args implementation defines the command-line interface for configuring llama-server. Each argument is registered as a common_arg object with a flag name, description, parser lambda, and optional environment variable binding. Arguments are scoped to the LLAMA_EXAMPLE_SERVER example type, ensuring they only appear when parsing server-specific invocations.

Usage

Arguments are passed when launching the server:

llama-server \
  --host 0.0.0.0 \
  --port 8080 \
  --model model.gguf \
  --embedding \
  --metrics \
  --api-key my-secret-key \
  --threads-http 4

Most arguments can alternatively be set via environment variables:

export LLAMA_ARG_HOST=0.0.0.0
export LLAMA_ARG_PORT=8080
export LLAMA_ARG_EMBEDDINGS=1
llama-server --model model.gguf

Code Reference

Field	Value
Source Location	`common/arg.cpp:2772-2969`
Primary Structure	`common_params` (populated by argument parsing)
Registration Function	`add_opt(common_arg(...))`
Import	`#include "arg.h"`

Network arguments:

add_opt(common_arg(
    {"--host"}, "HOST",
    string_format("ip address to listen, or bind to an UNIX socket if the address ends with .sock (default: %s)", params.hostname.c_str()),
    [](common_params & params, const std::string & value) {
        params.hostname = value;
    }
).set_examples({LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_ARG_HOST"));

add_opt(common_arg(
    {"--port"}, "PORT",
    string_format("port to listen (default: %d)", params.port),
    [](common_params & params, int value) {
        params.port = value;
    }
).set_examples({LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_ARG_PORT"));

Embedding and reranking mode:

add_opt(common_arg(
    {"--embedding", "--embeddings"},
    string_format("restrict to only support embedding use case; use only with dedicated embedding models (default: %s)", params.embedding ? "enabled" : "disabled"),
    [](common_params & params) {
        params.embedding = true;
    }
).set_examples({LLAMA_EXAMPLE_SERVER, LLAMA_EXAMPLE_DEBUG}).set_env("LLAMA_ARG_EMBEDDINGS"));

add_opt(common_arg(
    {"--rerank", "--reranking"},
    string_format("enable reranking endpoint on server (default: %s)", "disabled"),
    [](common_params & params) {
        params.embedding = true;
        params.pooling_type = LLAMA_POOLING_TYPE_RANK;
    }
).set_examples({LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_ARG_RERANKING"));

Security arguments:

add_opt(common_arg(
    {"--api-key"}, "KEY",
    "API key to use for authentication, multiple keys can be provided as a comma-separated list (default: none)",
    [](common_params & params, const std::string & value) {
        for (const auto & key : parse_csv_row(value)) {
            if (!key.empty()) {
                params.api_keys.push_back(key);
            }
        }
    }
).set_examples({LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_API_KEY"));

Monitoring and endpoint toggles:

add_opt(common_arg(
    {"--metrics"},
    string_format("enable prometheus compatible metrics endpoint (default: %s)", params.endpoint_metrics ? "enabled" : "disabled"),
    [](common_params & params) {
        params.endpoint_metrics = true;
    }
).set_examples({LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_ARG_ENDPOINT_METRICS"));

add_opt(common_arg(
    {"--slots"},
    {"--no-slots"},
    string_format("expose slots monitoring endpoint (default: %s)", params.endpoint_slots ? "enabled" : "disabled"),
    [](common_params & params, bool value) {
        params.endpoint_slots = value;
    }
).set_examples({LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_ARG_ENDPOINT_SLOTS"));

I/O Contract

Direction	Description
Input	Command-line arguments (`argc`, `argv`) and environment variables (`LLAMA_ARG_*`, `LLAMA_API_KEY`)
Output	Populated `common_params` struct with all server configuration values
Preconditions	Called via `common_params_parse(argc, argv, params, LLAMA_EXAMPLE_SERVER)`
Error Handling	Returns false on parse failure; throws `std::invalid_argument` for invalid directory paths; throws `std::runtime_error` for missing key files

Complete argument table:

Flag	Env Var	Description
`--host HOST`	`LLAMA_ARG_HOST`	IP address or UNIX socket path
`--port PORT`	`LLAMA_ARG_PORT`	TCP port number
`--path PATH`	`LLAMA_ARG_STATIC_PATH`	Static file serving directory
`--api-prefix PREFIX`	`LLAMA_ARG_API_PREFIX`	URL prefix for API routes
`--webui / --no-webui`	`LLAMA_ARG_WEBUI`	Enable/disable Web UI
`--embedding`	`LLAMA_ARG_EMBEDDINGS`	Enable embedding mode
`--rerank`	`LLAMA_ARG_RERANKING`	Enable reranking endpoint
`--api-key KEY`	`LLAMA_API_KEY`	API authentication key(s)
`--api-key-file FNAME`	none	File containing API keys
`--ssl-key-file FNAME`	`LLAMA_ARG_SSL_KEY_FILE`	PEM SSL private key
`--ssl-cert-file FNAME`	`LLAMA_ARG_SSL_CERT_FILE`	PEM SSL certificate
`-to, --timeout N`	`LLAMA_ARG_TIMEOUT`	Read/write timeout in seconds
`--threads-http N`	`LLAMA_ARG_THREADS_HTTP`	HTTP processing threads
`--cache-prompt / --no-cache-prompt`	`LLAMA_ARG_CACHE_PROMPT`	Enable/disable prompt caching
`--cache-reuse N`	`LLAMA_ARG_CACHE_REUSE`	Min chunk size for KV cache reuse
`--metrics`	`LLAMA_ARG_ENDPOINT_METRICS`	Enable Prometheus metrics
`--props`	`LLAMA_ARG_ENDPOINT_PROPS`	Enable props modification endpoint
`--slots / --no-slots`	`LLAMA_ARG_ENDPOINT_SLOTS`	Enable/disable slot monitoring
`--slot-save-path PATH`	none	Directory for slot KV cache saves

Usage Examples

Minimal server launch:

llama-server --model model.gguf

Production configuration with all monitoring:

llama-server \
  --host 0.0.0.0 \
  --port 8080 \
  --model model.gguf \
  --threads-http 8 \
  --timeout 30 \
  --api-key-file /etc/llama/keys.txt \
  --ssl-key-file /etc/ssl/server.key \
  --ssl-cert-file /etc/ssl/server.crt \
  --metrics \
  --slots \
  --cache-prompt

Embedding-only server:

llama-server \
  --model embedding-model.gguf \
  --embedding \
  --port 8081

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment