Principle:Ollama Ollama ModelCreation
| Knowledge Sources | |
|---|---|
| Domains | Model Import, Format Conversion |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
The Model Creation Pipeline handles the end-to-end process of importing model weights from external formats (HuggingFace SafeTensors, PyTorch checkpoints) and converting them into Ollama's native GGUF-based format, including tensor name mapping, quantization, tokenizer extraction, and manifest generation.
Core Concepts
Source Format Reading
The pipeline supports multiple source formats. SafeTensors files are read through a dedicated reader that parses the header to discover tensor names, shapes, and data types, then memory-maps the weight data for efficient access. PyTorch checkpoint files are similarly supported through a separate reader. Both readers implement a common interface that the conversion pipeline consumes.
Tensor Name Mapping
Different model frameworks use different naming conventions for the same logical tensors. For example, HuggingFace models might name attention weights model.layers.0.self_attn.q_proj.weight while Ollama's internal format uses blk.0.attn_q.weight. Each architecture's converter defines a mapping table that translates source names to the target GGUF naming convention.
Quantization
During conversion, the pipeline can optionally quantize model weights from their original floating-point precision (typically FP16 or BF16) to lower-bit formats (Q4_0, Q8_0, etc.) to reduce memory footprint and improve inference speed. Quantization is applied per-tensor with architecture-aware rules that may preserve higher precision for critical layers like embedding tables and output projections.
Tokenizer Extraction
The creation pipeline extracts tokenizer data from the source model, including vocabulary, merge rules, special tokens, and tokenizer configuration. This data is encoded into the GGUF metadata so that the resulting model file is self-contained and does not require external tokenizer files at inference time.
Image Generation Models
Beyond language models, the creation pipeline also supports image generation models (e.g., Stable Diffusion variants), which require additional handling for encoder architectures, VAE components, and conditioning networks.
Implementation Notes
The core conversion logic resides in convert/convert.go with architecture-specific converters in files like convert/convert_llama.go. Source format readers are in convert/reader_safetensors.go and convert/reader_torch.go. The extended creation pipeline including HuggingFace download integration is in x/create/, and image generation model support is in x/imagegen/. The server-side creation endpoint in server/create.go orchestrates the full process from Modelfile parsing through to manifest registration.