Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Ggml org Llama cpp Server CLI Args

From Leeroopedia
Field Value
Implementation Name Server CLI Args
Doc Type Wrapper Doc
Domain CLI Configuration, Argument Parsing
Description CLI argument parsing for llama-server configuration: host, port, model, parallelism, security, and endpoint toggles
Related Workflow OpenAI_Compatible_Server

Overview

Description

The Server CLI Args implementation defines the command-line interface for configuring llama-server. Each argument is registered as a common_arg object with a flag name, description, parser lambda, and optional environment variable binding. Arguments are scoped to the LLAMA_EXAMPLE_SERVER example type, ensuring they only appear when parsing server-specific invocations.

Usage

Arguments are passed when launching the server:

llama-server \
  --host 0.0.0.0 \
  --port 8080 \
  --model model.gguf \
  --embedding \
  --metrics \
  --api-key my-secret-key \
  --threads-http 4

Most arguments can alternatively be set via environment variables:

export LLAMA_ARG_HOST=0.0.0.0
export LLAMA_ARG_PORT=8080
export LLAMA_ARG_EMBEDDINGS=1
llama-server --model model.gguf

Code Reference

Field Value
Source Location common/arg.cpp:2772-2969
Primary Structure common_params (populated by argument parsing)
Registration Function add_opt(common_arg(...))
Import #include "arg.h"

Network arguments:

add_opt(common_arg(
    {"--host"}, "HOST",
    string_format("ip address to listen, or bind to an UNIX socket if the address ends with .sock (default: %s)", params.hostname.c_str()),
    [](common_params & params, const std::string & value) {
        params.hostname = value;
    }
).set_examples({LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_ARG_HOST"));

add_opt(common_arg(
    {"--port"}, "PORT",
    string_format("port to listen (default: %d)", params.port),
    [](common_params & params, int value) {
        params.port = value;
    }
).set_examples({LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_ARG_PORT"));

Embedding and reranking mode:

add_opt(common_arg(
    {"--embedding", "--embeddings"},
    string_format("restrict to only support embedding use case; use only with dedicated embedding models (default: %s)", params.embedding ? "enabled" : "disabled"),
    [](common_params & params) {
        params.embedding = true;
    }
).set_examples({LLAMA_EXAMPLE_SERVER, LLAMA_EXAMPLE_DEBUG}).set_env("LLAMA_ARG_EMBEDDINGS"));

add_opt(common_arg(
    {"--rerank", "--reranking"},
    string_format("enable reranking endpoint on server (default: %s)", "disabled"),
    [](common_params & params) {
        params.embedding = true;
        params.pooling_type = LLAMA_POOLING_TYPE_RANK;
    }
).set_examples({LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_ARG_RERANKING"));

Security arguments:

add_opt(common_arg(
    {"--api-key"}, "KEY",
    "API key to use for authentication, multiple keys can be provided as a comma-separated list (default: none)",
    [](common_params & params, const std::string & value) {
        for (const auto & key : parse_csv_row(value)) {
            if (!key.empty()) {
                params.api_keys.push_back(key);
            }
        }
    }
).set_examples({LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_API_KEY"));

Monitoring and endpoint toggles:

add_opt(common_arg(
    {"--metrics"},
    string_format("enable prometheus compatible metrics endpoint (default: %s)", params.endpoint_metrics ? "enabled" : "disabled"),
    [](common_params & params) {
        params.endpoint_metrics = true;
    }
).set_examples({LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_ARG_ENDPOINT_METRICS"));

add_opt(common_arg(
    {"--slots"},
    {"--no-slots"},
    string_format("expose slots monitoring endpoint (default: %s)", params.endpoint_slots ? "enabled" : "disabled"),
    [](common_params & params, bool value) {
        params.endpoint_slots = value;
    }
).set_examples({LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_ARG_ENDPOINT_SLOTS"));

I/O Contract

Direction Description
Input Command-line arguments (argc, argv) and environment variables (LLAMA_ARG_*, LLAMA_API_KEY)
Output Populated common_params struct with all server configuration values
Preconditions Called via common_params_parse(argc, argv, params, LLAMA_EXAMPLE_SERVER)
Error Handling Returns false on parse failure; throws std::invalid_argument for invalid directory paths; throws std::runtime_error for missing key files

Complete argument table:

Flag Env Var Description
--host HOST LLAMA_ARG_HOST IP address or UNIX socket path
--port PORT LLAMA_ARG_PORT TCP port number
--path PATH LLAMA_ARG_STATIC_PATH Static file serving directory
--api-prefix PREFIX LLAMA_ARG_API_PREFIX URL prefix for API routes
--webui / --no-webui LLAMA_ARG_WEBUI Enable/disable Web UI
--embedding LLAMA_ARG_EMBEDDINGS Enable embedding mode
--rerank LLAMA_ARG_RERANKING Enable reranking endpoint
--api-key KEY LLAMA_API_KEY API authentication key(s)
--api-key-file FNAME none File containing API keys
--ssl-key-file FNAME LLAMA_ARG_SSL_KEY_FILE PEM SSL private key
--ssl-cert-file FNAME LLAMA_ARG_SSL_CERT_FILE PEM SSL certificate
-to, --timeout N LLAMA_ARG_TIMEOUT Read/write timeout in seconds
--threads-http N LLAMA_ARG_THREADS_HTTP HTTP processing threads
--cache-prompt / --no-cache-prompt LLAMA_ARG_CACHE_PROMPT Enable/disable prompt caching
--cache-reuse N LLAMA_ARG_CACHE_REUSE Min chunk size for KV cache reuse
--metrics LLAMA_ARG_ENDPOINT_METRICS Enable Prometheus metrics
--props LLAMA_ARG_ENDPOINT_PROPS Enable props modification endpoint
--slots / --no-slots LLAMA_ARG_ENDPOINT_SLOTS Enable/disable slot monitoring
--slot-save-path PATH none Directory for slot KV cache saves

Usage Examples

Minimal server launch:

llama-server --model model.gguf

Production configuration with all monitoring:

llama-server \
  --host 0.0.0.0 \
  --port 8080 \
  --model model.gguf \
  --threads-http 8 \
  --timeout 30 \
  --api-key-file /etc/llama/keys.txt \
  --ssl-key-file /etc/ssl/server.key \
  --ssl-cert-file /etc/ssl/server.crt \
  --metrics \
  --slots \
  --cache-prompt

Embedding-only server:

llama-server \
  --model embedding-model.gguf \
  --embedding \
  --port 8081

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment