Implementation:Ggml org Llama cpp Server CLI Args
| Field | Value |
|---|---|
| Implementation Name | Server CLI Args |
| Doc Type | Wrapper Doc |
| Domain | CLI Configuration, Argument Parsing |
| Description | CLI argument parsing for llama-server configuration: host, port, model, parallelism, security, and endpoint toggles |
| Related Workflow | OpenAI_Compatible_Server |
Overview
Description
The Server CLI Args implementation defines the command-line interface for configuring llama-server. Each argument is registered as a common_arg object with a flag name, description, parser lambda, and optional environment variable binding. Arguments are scoped to the LLAMA_EXAMPLE_SERVER example type, ensuring they only appear when parsing server-specific invocations.
Usage
Arguments are passed when launching the server:
llama-server \
--host 0.0.0.0 \
--port 8080 \
--model model.gguf \
--embedding \
--metrics \
--api-key my-secret-key \
--threads-http 4
Most arguments can alternatively be set via environment variables:
export LLAMA_ARG_HOST=0.0.0.0
export LLAMA_ARG_PORT=8080
export LLAMA_ARG_EMBEDDINGS=1
llama-server --model model.gguf
Code Reference
| Field | Value |
|---|---|
| Source Location | common/arg.cpp:2772-2969
|
| Primary Structure | common_params (populated by argument parsing)
|
| Registration Function | add_opt(common_arg(...))
|
| Import | #include "arg.h"
|
Network arguments:
add_opt(common_arg(
{"--host"}, "HOST",
string_format("ip address to listen, or bind to an UNIX socket if the address ends with .sock (default: %s)", params.hostname.c_str()),
[](common_params & params, const std::string & value) {
params.hostname = value;
}
).set_examples({LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_ARG_HOST"));
add_opt(common_arg(
{"--port"}, "PORT",
string_format("port to listen (default: %d)", params.port),
[](common_params & params, int value) {
params.port = value;
}
).set_examples({LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_ARG_PORT"));
Embedding and reranking mode:
add_opt(common_arg(
{"--embedding", "--embeddings"},
string_format("restrict to only support embedding use case; use only with dedicated embedding models (default: %s)", params.embedding ? "enabled" : "disabled"),
[](common_params & params) {
params.embedding = true;
}
).set_examples({LLAMA_EXAMPLE_SERVER, LLAMA_EXAMPLE_DEBUG}).set_env("LLAMA_ARG_EMBEDDINGS"));
add_opt(common_arg(
{"--rerank", "--reranking"},
string_format("enable reranking endpoint on server (default: %s)", "disabled"),
[](common_params & params) {
params.embedding = true;
params.pooling_type = LLAMA_POOLING_TYPE_RANK;
}
).set_examples({LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_ARG_RERANKING"));
Security arguments:
add_opt(common_arg(
{"--api-key"}, "KEY",
"API key to use for authentication, multiple keys can be provided as a comma-separated list (default: none)",
[](common_params & params, const std::string & value) {
for (const auto & key : parse_csv_row(value)) {
if (!key.empty()) {
params.api_keys.push_back(key);
}
}
}
).set_examples({LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_API_KEY"));
Monitoring and endpoint toggles:
add_opt(common_arg(
{"--metrics"},
string_format("enable prometheus compatible metrics endpoint (default: %s)", params.endpoint_metrics ? "enabled" : "disabled"),
[](common_params & params) {
params.endpoint_metrics = true;
}
).set_examples({LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_ARG_ENDPOINT_METRICS"));
add_opt(common_arg(
{"--slots"},
{"--no-slots"},
string_format("expose slots monitoring endpoint (default: %s)", params.endpoint_slots ? "enabled" : "disabled"),
[](common_params & params, bool value) {
params.endpoint_slots = value;
}
).set_examples({LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_ARG_ENDPOINT_SLOTS"));
I/O Contract
| Direction | Description |
|---|---|
| Input | Command-line arguments (argc, argv) and environment variables (LLAMA_ARG_*, LLAMA_API_KEY)
|
| Output | Populated common_params struct with all server configuration values
|
| Preconditions | Called via common_params_parse(argc, argv, params, LLAMA_EXAMPLE_SERVER)
|
| Error Handling | Returns false on parse failure; throws std::invalid_argument for invalid directory paths; throws std::runtime_error for missing key files
|
Complete argument table:
| Flag | Env Var | Description |
|---|---|---|
--host HOST |
LLAMA_ARG_HOST |
IP address or UNIX socket path |
--port PORT |
LLAMA_ARG_PORT |
TCP port number |
--path PATH |
LLAMA_ARG_STATIC_PATH |
Static file serving directory |
--api-prefix PREFIX |
LLAMA_ARG_API_PREFIX |
URL prefix for API routes |
--webui / --no-webui |
LLAMA_ARG_WEBUI |
Enable/disable Web UI |
--embedding |
LLAMA_ARG_EMBEDDINGS |
Enable embedding mode |
--rerank |
LLAMA_ARG_RERANKING |
Enable reranking endpoint |
--api-key KEY |
LLAMA_API_KEY |
API authentication key(s) |
--api-key-file FNAME |
none | File containing API keys |
--ssl-key-file FNAME |
LLAMA_ARG_SSL_KEY_FILE |
PEM SSL private key |
--ssl-cert-file FNAME |
LLAMA_ARG_SSL_CERT_FILE |
PEM SSL certificate |
-to, --timeout N |
LLAMA_ARG_TIMEOUT |
Read/write timeout in seconds |
--threads-http N |
LLAMA_ARG_THREADS_HTTP |
HTTP processing threads |
--cache-prompt / --no-cache-prompt |
LLAMA_ARG_CACHE_PROMPT |
Enable/disable prompt caching |
--cache-reuse N |
LLAMA_ARG_CACHE_REUSE |
Min chunk size for KV cache reuse |
--metrics |
LLAMA_ARG_ENDPOINT_METRICS |
Enable Prometheus metrics |
--props |
LLAMA_ARG_ENDPOINT_PROPS |
Enable props modification endpoint |
--slots / --no-slots |
LLAMA_ARG_ENDPOINT_SLOTS |
Enable/disable slot monitoring |
--slot-save-path PATH |
none | Directory for slot KV cache saves |
Usage Examples
Minimal server launch:
llama-server --model model.gguf
Production configuration with all monitoring:
llama-server \
--host 0.0.0.0 \
--port 8080 \
--model model.gguf \
--threads-http 8 \
--timeout 30 \
--api-key-file /etc/llama/keys.txt \
--ssl-key-file /etc/ssl/server.key \
--ssl-cert-file /etc/ssl/server.crt \
--metrics \
--slots \
--cache-prompt
Embedding-only server:
llama-server \
--model embedding-model.gguf \
--embedding \
--port 8081