Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp Server Chat Completions

From Leeroopedia
Field Value
Implementation Name Server Chat Completions
Doc Type API Doc
Domain REST API, OpenAI Compatibility
Description OpenAI-compatible API endpoints: /v1/chat/completions, /v1/completions, /v1/embeddings, and multi-provider translation
Related Workflow OpenAI_Compatible_Server (CORE)

Overview

Description

The Server Chat Completions implementation defines the core API handler lambdas registered in server_routes::init_routes(). These handlers parse incoming HTTP requests, translate them into the internal task representation, submit them to the inference task queue, and format responses in the appropriate provider format. The implementation supports OpenAI, Anthropic, and Ollama request formats through protocol translation.

Usage

Clients interact with these endpoints using standard HTTP requests:

# OpenAI chat completions
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"llama","messages":[{"role":"user","content":"Hello"}]}'

# OpenAI completions
curl http://localhost:8080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"llama","prompt":"Once upon a time"}'

# OpenAI embeddings
curl http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"model":"llama","input":"Hello world"}'

Code Reference

Field Value
Source Location tools/server/server-context.cpp:3179-3907
Entry Function void server_routes::init_routes()
Import Defined within server-context static library; handlers are lambda members of server_routes

Chat completions handler (/v1/chat/completions):

this->post_chat_completions = [this](const server_http_req & req) {
    auto res = create_response();
    std::vector<raw_buffer> files;
    json body = json::parse(req.body);
    json body_parsed = oaicompat_chat_params_parse(
        body,
        meta->chat_params,
        files);
    return handle_completions_impl(
        req,
        SERVER_TASK_TYPE_COMPLETION,
        body_parsed,
        files,
        TASK_RESPONSE_TYPE_OAI_CHAT);
};

Text completions handler (/v1/completions):

this->post_completions_oai = [this](const server_http_req & req) {
    auto res = create_response();
    std::vector<raw_buffer> files; // dummy
    const json body = json::parse(req.body);
    return handle_completions_impl(
        req,
        SERVER_TASK_TYPE_COMPLETION,
        body,
        files,
        TASK_RESPONSE_TYPE_OAI_CMPL);
};

Embeddings handler (/v1/embeddings):

this->post_embeddings_oai = [this](const server_http_req & req) {
    return handle_embeddings_impl(req, TASK_RESPONSE_TYPE_OAI_EMBD);
};

Anthropic Messages API translation (/v1/messages):

this->post_anthropic_messages = [this](const server_http_req & req) {
    auto res = create_response();
    std::vector<raw_buffer> files;
    json body = convert_anthropic_to_oai(json::parse(req.body));
    json body_parsed = oaicompat_chat_params_parse(
        body,
        meta->chat_params,
        files);
    return handle_completions_impl(
        req,
        SERVER_TASK_TYPE_COMPLETION,
        body_parsed,
        files,
        TASK_RESPONSE_TYPE_ANTHROPIC);
};

OpenAI Responses API translation (/v1/responses):

this->post_responses_oai = [this](const server_http_req & req) {
    auto res = create_response();
    std::vector<raw_buffer> files;
    json body = convert_responses_to_chatcmpl(json::parse(req.body));
    json body_parsed = oaicompat_chat_params_parse(
        body,
        meta->chat_params,
        files);
    return handle_completions_impl(
        req,
        SERVER_TASK_TYPE_COMPLETION,
        body_parsed,
        files,
        TASK_RESPONSE_TYPE_OAI_RESP);
};

I/O Contract

Endpoint Method Request Body Response Format
/v1/chat/completions POST {"model": str, "messages": [...], "stream": bool, "temperature": float, ...} OpenAI ChatCompletion object or SSE stream
/v1/completions POST {"model": str, "prompt": str, "max_tokens": int, ...} OpenAI Completion object or SSE stream
/v1/embeddings POST {"model": str, "input": str or [str], ...} OpenAI Embedding object with data[].embedding arrays
/v1/messages POST Anthropic Messages format Anthropic Messages response format
/v1/responses POST OpenAI Responses format OpenAI Responses response format
/chat/completions POST Same as /v1/chat/completions Same as /v1/chat/completions
/api/chat POST Ollama chat format Same as /v1/chat/completions (Ollama-compatible)

Response type tags:

Tag Description
TASK_RESPONSE_TYPE_NONE Native llama.cpp response format (legacy endpoints)
TASK_RESPONSE_TYPE_OAI_CHAT OpenAI Chat Completions format
TASK_RESPONSE_TYPE_OAI_CMPL OpenAI Completions format
TASK_RESPONSE_TYPE_OAI_EMBD OpenAI Embeddings format
TASK_RESPONSE_TYPE_ANTHROPIC Anthropic Messages format
TASK_RESPONSE_TYPE_OAI_RESP OpenAI Responses format

Usage Examples

Chat completion with streaming:

curl -N http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in one sentence."}
    ],
    "stream": true,
    "temperature": 0.7
  }'

Anthropic-format request (translated internally):

curl http://localhost:8080/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama",
    "max_tokens": 256,
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

Token counting (Anthropic-compatible):

curl http://localhost:8080/v1/messages/count_tokens \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama",
    "messages": [
      {"role": "user", "content": "Count my tokens"}
    ]
  }'
# Response: {"input_tokens": 5}

Embedding extraction via OpenAI-compatible endpoint:

curl http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "embedding-model",
    "input": ["Hello world", "Goodbye world"]
  }'

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment