Implementation:Ollama Ollama Chat Handler

Knowledge Sources	Ollama
Domains	Systems, Networking, API_Design
Last Updated	2026-02-14 00:00 GMT

Overview

Concrete tool for handling chat and generate inference requests with streaming response delivery provided by the server package.

Description

ChatHandler and GenerateHandler are the primary HTTP endpoint handlers for Ollama's inference API. They orchestrate the complete request lifecycle: parse the request, obtain a model runner from the scheduler, construct the prompt, invoke the inference engine, and stream the response back to the client.

ChatHandler (/api/chat) handles multi-turn conversations with support for tool calling, thinking mode, structured output (JSON schema), and image inputs. GenerateHandler (/api/generate) handles single-turn text completion with raw prompt input.

Both handlers support streaming (default) and non-streaming modes. In streaming mode, partial responses are written as newline-delimited JSON objects flushed after each token batch.

Usage

These are the primary inference endpoints for Ollama clients. ChatHandler is used for conversational interactions; GenerateHandler for raw text completion.

Code Reference

Source Location

Repository: ollama
File: server/routes.go
Lines: L1983-2546 (ChatHandler), L183-664 (GenerateHandler)

Signature

func (s *Server) ChatHandler(c *gin.Context)

func (s *Server) GenerateHandler(c *gin.Context)

Import

import "github.com/ollama/ollama/server"

I/O Contract

Inputs (ChatHandler)

Name	Type	Required	Description
c	*gin.Context	Yes	HTTP context with api.ChatRequest JSON body
Model	string	Yes	Model name (in request body)
Messages	[]api.Message	Yes	Chat message history
Stream	*bool	No	Enable streaming (default: true)
Tools	[]api.Tool	No	Function calling tool definitions
Format	json.RawMessage	No	Structured output schema
Options	api.Options	No	Runtime inference options

Inputs (GenerateHandler)

Name	Type	Required	Description
c	*gin.Context	Yes	HTTP context with api.GenerateRequest JSON body
Model	string	Yes	Model name (in request body)
Prompt	string	Yes	Raw text prompt
Stream	*bool	No	Enable streaming (default: true)
Images	[]ImageData	No	Base64 image data for multimodal
Options	api.Options	No	Runtime inference options

Outputs

Name	Type	Description
Streaming response	NDJSON	Sequence of api.ChatResponse/api.GenerateResponse JSON objects
Final response	JSON	Single JSON object if stream=false, includes Done=true and metrics

Usage Examples

Chat API Call

# Streaming chat request
curl http://localhost:11434/api/chat -d '{
  "model": "llama3",
  "messages": [
    {"role": "user", "content": "Why is the sky blue?"}
  ]
}'

# Non-streaming
curl http://localhost:11434/api/chat -d '{
  "model": "llama3",
  "messages": [
    {"role": "user", "content": "Why is the sky blue?"}
  ],
  "stream": false
}'

Generate API Call

curl http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt": "Once upon a time"
}'

Tool Calling

curl http://localhost:11434/api/chat -d '{
  "model": "llama3",
  "messages": [
    {"role": "user", "content": "What is the weather in Paris?"}
  ],
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get weather for a location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {"type": "string"}
        },
        "required": ["location"]
      }
    }
  }]
}'

Related Pages

Implements Principle

Principle:Ollama_Ollama_Response_Streaming

Requires Environment

Environment:Ollama_Ollama_Go_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment