Implementation:Ollama Ollama Inference Handler
| Knowledge Sources | |
|---|---|
| Domains | Systems, Model_Serving |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
Concrete tool for dispatching inference requests through Ollama's native handler pipeline provided by the server package.
Description
In the OpenAI compatibility context, ChatHandler and GenerateHandler serve as the inference dispatch point. When OpenAI middleware translates a request and replaces the request body, the native handler processes it identically to a native Ollama request. The response is captured by the middleware's custom response writer for format translation.
This implementation is shared with the Response_Streaming principle's implementation (Chat_Handler) but documented separately here to reflect the dispatch role in the OpenAI compatibility workflow.
Usage
Invoked automatically by the middleware chain. The native handlers are unaware they are serving an OpenAI-format request.
Code Reference
Source Location
- Repository: ollama
- File: server/routes.go
- Lines: L1983-2546 (ChatHandler), L183-664 (GenerateHandler)
Signature
func (s *Server) ChatHandler(c *gin.Context)
func (s *Server) GenerateHandler(c *gin.Context)
Import
import "github.com/ollama/ollama/server"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| c | *gin.Context | Yes | HTTP context with translated Ollama request body (from middleware) |
Outputs
| Name | Type | Description |
|---|---|---|
| Response stream | bytes | Written to c.Writer (which is the middleware's custom ChatWriter/CompletionWriter) |
Usage Examples
OpenAI Streaming Chat
# The client sees OpenAI-format SSE stream
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama3",
"messages": [{"role": "user", "content": "Hello"}],
"stream": true
}'
# Response: data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"Hi"},...}],...}