Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Ollama Ollama Chat Handler

From Leeroopedia
Knowledge Sources
Domains Systems, Networking, API_Design
Last Updated 2026-02-14 00:00 GMT

Overview

Concrete tool for handling chat and generate inference requests with streaming response delivery provided by the server package.

Description

ChatHandler and GenerateHandler are the primary HTTP endpoint handlers for Ollama's inference API. They orchestrate the complete request lifecycle: parse the request, obtain a model runner from the scheduler, construct the prompt, invoke the inference engine, and stream the response back to the client.

ChatHandler (/api/chat) handles multi-turn conversations with support for tool calling, thinking mode, structured output (JSON schema), and image inputs. GenerateHandler (/api/generate) handles single-turn text completion with raw prompt input.

Both handlers support streaming (default) and non-streaming modes. In streaming mode, partial responses are written as newline-delimited JSON objects flushed after each token batch.

Usage

These are the primary inference endpoints for Ollama clients. ChatHandler is used for conversational interactions; GenerateHandler for raw text completion.

Code Reference

Source Location

  • Repository: ollama
  • File: server/routes.go
  • Lines: L1983-2546 (ChatHandler), L183-664 (GenerateHandler)

Signature

func (s *Server) ChatHandler(c *gin.Context)
func (s *Server) GenerateHandler(c *gin.Context)

Import

import "github.com/ollama/ollama/server"

I/O Contract

Inputs (ChatHandler)

Name Type Required Description
c *gin.Context Yes HTTP context with api.ChatRequest JSON body
Model string Yes Model name (in request body)
Messages []api.Message Yes Chat message history
Stream *bool No Enable streaming (default: true)
Tools []api.Tool No Function calling tool definitions
Format json.RawMessage No Structured output schema
Options api.Options No Runtime inference options

Inputs (GenerateHandler)

Name Type Required Description
c *gin.Context Yes HTTP context with api.GenerateRequest JSON body
Model string Yes Model name (in request body)
Prompt string Yes Raw text prompt
Stream *bool No Enable streaming (default: true)
Images []ImageData No Base64 image data for multimodal
Options api.Options No Runtime inference options

Outputs

Name Type Description
Streaming response NDJSON Sequence of api.ChatResponse/api.GenerateResponse JSON objects
Final response JSON Single JSON object if stream=false, includes Done=true and metrics

Usage Examples

Chat API Call

# Streaming chat request
curl http://localhost:11434/api/chat -d '{
  "model": "llama3",
  "messages": [
    {"role": "user", "content": "Why is the sky blue?"}
  ]
}'

# Non-streaming
curl http://localhost:11434/api/chat -d '{
  "model": "llama3",
  "messages": [
    {"role": "user", "content": "Why is the sky blue?"}
  ],
  "stream": false
}'

Generate API Call

curl http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt": "Once upon a time"
}'

Tool Calling

curl http://localhost:11434/api/chat -d '{
  "model": "llama3",
  "messages": [
    {"role": "user", "content": "What is the weather in Paris?"}
  ],
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get weather for a location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {"type": "string"}
        },
        "required": ["location"]
      }
    }
  }]
}'

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment