Implementation:Ollama Ollama Quantize

Knowledge Sources	Ollama
Domains	Model_Optimization, Compression
Last Updated	2026-02-14 00:00 GMT

Overview

Concrete tool for quantizing GGUF model files to lower precision provided by the server and llama.cpp packages.

Description

The Go quantize function reads a full-precision GGUF file, applies per-tensor quantization type selection via getTensorNewType, and writes the quantized output. It delegates the actual tensor compression to the llama.cpp C library via CGo bindings.

getTensorNewType implements the tensor-level quantization policy: which tensors get which quantization type based on their name (attention vs feed-forward), shape (1D tensors stay float), and the target file type.

The C-level llama_model_quantize function in llama.cpp performs the heavy computation of quantizing tensor data blocks.

Usage

Used when creating quantized model variants via the ollama create command with the --quantize flag, or when the server auto-quantizes imported SafeTensors models.

Code Reference

Source Location

Repository: ollama
File: server/quantization.go (quantize, getTensorNewType), llama/llama.cpp/src/llama-quant.cpp (llama_model_quantize)
Lines: quantization.go:L201-244 (quantize), quantization.go:L103-200 (getTensorNewType), llama-quant.cpp:L1-1072

Signature

func quantize(in, out *os.File, orig *fsggml.GGML, newFileType fsggml.FileType, progressFn func(n uint64)) error

func getTensorNewType(kv fsggml.KV, qs *quantizeState, newType fsggml.TensorType, name string, shape []uint64, ftype fsggml.FileType) fsggml.TensorType

Import

import "github.com/ollama/ollama/server"

I/O Contract

Inputs

Name	Type	Required	Description
in	*os.File	Yes	Full-precision GGUF input file (F16 or F32)
out	*os.File	Yes	Output file for quantized GGUF
orig	*fsggml.GGML	Yes	Parsed GGUF metadata from input file
newFileType	fsggml.FileType	Yes	Target quantization type (Q4_0, Q4_K_M, Q5_K_M, Q8_0, etc.)
progressFn	func(n uint64)	No	Progress callback (bytes processed)

Outputs

Name	Type	Description
error	error	Non-nil if quantization fails
Side effect	Quantized GGUF	Compressed model file written to output

Usage Examples

Quantize via CLI

# Create a quantized model from a Modelfile
ollama create my-model -f Modelfile --quantize q4_0

# Supported quantization types:
# q4_0, q4_1, q5_0, q5_1, q8_0
# q4_K_S, q4_K_M, q5_K_S, q5_K_M, q6_K

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment