Principle:Ollama Ollama Tensor Reading
| Knowledge Sources | |
|---|---|
| Domains | Format_Conversion, Data_Processing |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
A format-agnostic tensor reading mechanism that parses model weight tensors from SafeTensors or PyTorch formats and maps tensor names from HuggingFace conventions to GGUF conventions.
Description
Tensor Reading is the process of extracting weight tensors from serialized model files and remapping their names for the target format. HuggingFace models use naming conventions like model.layers.0.self_attn.q_proj.weight, while GGUF uses blk.0.attn_q.weight. Each model architecture has its own name mapping rules.
The reader supports multiple source formats: SafeTensors (JSON header + raw tensor data) and PyTorch (.bin/.pth with pickle serialization). The reading is lazy — tensor data is memory-mapped and only read when the encoder writes the output file.
Usage
Use this principle when implementing a model format converter that must read tensors from various serialization formats and remap names according to architecture-specific rules.
Theoretical Basis
The reading process:
- Format Detection: Check for .safetensors files (preferred) or .bin/.pth files (PyTorch fallback).
- Header Parsing: For SafeTensors, read the JSON header that contains tensor metadata (name, dtype, shape, offset).
- Name Remapping: Apply architecture-specific string replacements to convert HuggingFace names to GGUF names.
- Lazy Loading: Return tensor objects with data readers that will read from the memory-mapped file on demand.
- Type Conversion: Handle data type mappings (e.g., bfloat16 → float16 for GGUF compatibility).