Principle:Ollama Ollama APIDesign
| Knowledge Sources | |
|---|---|
| Domains | API Design, REST |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Ollama's API Design defines a RESTful HTTP interface documented with OpenAPI specifications, providing endpoints for model management, text generation, chat completion, embeddings, and compatibility layers that mirror the OpenAI API surface.
Core Concepts
Native API Surface
The native Ollama API provides endpoints for the full model lifecycle: /api/generate for text completion, /api/chat for multi-turn conversation, /api/create for model creation from Modelfiles, /api/pull and /api/push for registry operations, /api/show for model metadata, /api/tags for listing local models, and /api/delete for model removal. Each endpoint uses JSON request and response bodies with consistent error handling conventions.
OpenAI Compatibility Layer
To enable drop-in replacement for OpenAI API clients, Ollama implements a compatibility layer that translates OpenAI-format requests (/v1/chat/completions, /v1/completions, /v1/embeddings, /v1/models) to Ollama's internal format. The translation handles differences in parameter naming, response structure, streaming format (SSE vs. JSON lines), and tool calling conventions. This allows existing applications and libraries built for the OpenAI API to work with local Ollama models.
Streaming Protocol
Both the native and OpenAI-compatible APIs support streaming responses. The native API uses newline-delimited JSON (NDJSON), where each line is a complete JSON object containing a token or status update. The OpenAI-compatible API uses Server-Sent Events (SSE) format with data: prefixed lines. Both streaming modes include a final message indicating completion along with generation statistics (tokens per second, total tokens, timing data).
Request Options
Inference endpoints accept a rich set of options controlling model behavior: temperature, top-k, top-p, min-p, seed for sampling; num_predict for output length; system prompt override; format for structured output; context window size; and keep_alive for model caching duration. These options are documented in the API specification and validated server-side with sensible defaults.
OpenAPI Specification
The API is formally documented using the OpenAPI (Swagger) specification, which defines all endpoints, request/response schemas, parameter types, and error codes. This specification serves as both documentation and a contract for client code generation, enabling automatic SDK generation for multiple programming languages.
Implementation Notes
The native API routes are defined in server/routes.go. The OpenAI compatibility layer is implemented in openai/ with route registration, request translation, and response translation. The API specification document is maintained alongside the codebase in docs/. The api/ package defines the Go types for API request and response structures, used by both the server and the Go client library.