Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Quantize

From Leeroopedia
Knowledge Sources
Domains Model_Optimization, Compression
Last Updated 2026-02-14 00:00 GMT

Overview

Concrete tool for quantizing GGUF model files to lower precision provided by the server and llama.cpp packages.

Description

The Go quantize function reads a full-precision GGUF file, applies per-tensor quantization type selection via getTensorNewType, and writes the quantized output. It delegates the actual tensor compression to the llama.cpp C library via CGo bindings.

getTensorNewType implements the tensor-level quantization policy: which tensors get which quantization type based on their name (attention vs feed-forward), shape (1D tensors stay float), and the target file type.

The C-level llama_model_quantize function in llama.cpp performs the heavy computation of quantizing tensor data blocks.

Usage

Used when creating quantized model variants via the ollama create command with the --quantize flag, or when the server auto-quantizes imported SafeTensors models.

Code Reference

Source Location

  • Repository: ollama
  • File: server/quantization.go (quantize, getTensorNewType), llama/llama.cpp/src/llama-quant.cpp (llama_model_quantize)
  • Lines: quantization.go:L201-244 (quantize), quantization.go:L103-200 (getTensorNewType), llama-quant.cpp:L1-1072

Signature

func quantize(in, out *os.File, orig *fsggml.GGML, newFileType fsggml.FileType, progressFn func(n uint64)) error
func getTensorNewType(kv fsggml.KV, qs *quantizeState, newType fsggml.TensorType, name string, shape []uint64, ftype fsggml.FileType) fsggml.TensorType

Import

import "github.com/ollama/ollama/server"

I/O Contract

Inputs

Name Type Required Description
in *os.File Yes Full-precision GGUF input file (F16 or F32)
out *os.File Yes Output file for quantized GGUF
orig *fsggml.GGML Yes Parsed GGUF metadata from input file
newFileType fsggml.FileType Yes Target quantization type (Q4_0, Q4_K_M, Q5_K_M, Q8_0, etc.)
progressFn func(n uint64) No Progress callback (bytes processed)

Outputs

Name Type Description
error error Non-nil if quantization fails
Side effect Quantized GGUF Compressed model file written to output

Usage Examples

Quantize via CLI

# Create a quantized model from a Modelfile
ollama create my-model -f Modelfile --quantize q4_0

# Supported quantization types:
# q4_0, q4_1, q5_0, q5_1, q8_0
# q4_K_S, q4_K_M, q5_K_S, q5_K_M, q6_K

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment