Implementation:Ollama Ollama Convert DeepSeekOcr

Knowledge Sources	Ollama
Domains	Model Conversion, GGUF Format
Last Updated	2025-02-15 00:00 GMT

Overview

Implements the GGUF model converter for the DeepSeek-OCR multimodal architecture, handling a language model with MoE experts, a CLIP vision encoder, and a SAM-based vision backbone.

Description

The deepseekocr struct implements ModelConverter for converting DeepSeek-OCR models that combine a DeepSeek-style language model with dual vision encoders (CLIP-L and SAM ViT-B). The KV method emits metadata for the language model (block count, attention heads, MoE expert parameters), the CLIP vision encoder (layers, width, heads, image/patch size), and the SAM encoder (layers, width, heads, global attention indexes). The Tensors method merges per-expert gate, up, and down projection tensors into consolidated expert weight matrices. Tensor name replacements handle the model.vision_model, model.projector, and model.sam_model namespaces.

Usage

Invoked automatically when the model's architecture matches DeepSeekOCRForCausalLM or similar DeepSeek-OCR architecture identifiers.

Code Reference

Source Location

Repository: Ollama
File: convert/convert_deepseekocr.go
Lines: 1-136

Signature

type deepseekocr struct {
    ModelParameters
    LanguageConfig struct {
        HiddenLayers       uint32 `json:"num_hidden_layers"`
        NumRoutedExperts   uint32 `json:"n_routed_experts"`
        FirstKDenseReplace uint32 `json:"first_k_dense_replace"`
    } `json:"language_config"`
    VisionConfig struct { ... } `json:"vision_config"`
}

func (m *deepseekocr) KV(t *Tokenizer) KV
func (m *deepseekocr) Tensors(s []Tensor) (out []*ggml.Tensor)
func (m *deepseekocr) Replacements() []string

Import

import "github.com/ollama/ollama/convert"

I/O Contract

Inputs

Name	Type	Required	Description
t	*Tokenizer	Yes	Tokenizer data for GGUF metadata
s	[]Tensor	Yes	Source tensors from language, vision, and SAM components

Outputs

Name	Type	Description
KV	KV	GGUF metadata for language (MoE), CLIP vision, and SAM encoder params
[]*ggml.Tensor	slice	Converted tensors with merged MoE experts

Usage Examples

// Converter registered for DeepSeek-OCR architecture
// m := &deepseekocr{}
// json.Unmarshal(configData, m)
// kv := m.KV(tokenizer)
// tensors := m.Tensors(sourceTensors)

Related Pages

Principle:Ollama_Ollama_GGUF_Model_Conversion_DeepSeekOcr

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment