Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Convert DeepSeekOcr

From Leeroopedia
Knowledge Sources
Domains Model Conversion, GGUF Format
Last Updated 2025-02-15 00:00 GMT

Overview

Implements the GGUF model converter for the DeepSeek-OCR multimodal architecture, handling a language model with MoE experts, a CLIP vision encoder, and a SAM-based vision backbone.

Description

The deepseekocr struct implements ModelConverter for converting DeepSeek-OCR models that combine a DeepSeek-style language model with dual vision encoders (CLIP-L and SAM ViT-B). The KV method emits metadata for the language model (block count, attention heads, MoE expert parameters), the CLIP vision encoder (layers, width, heads, image/patch size), and the SAM encoder (layers, width, heads, global attention indexes). The Tensors method merges per-expert gate, up, and down projection tensors into consolidated expert weight matrices. Tensor name replacements handle the model.vision_model, model.projector, and model.sam_model namespaces.

Usage

Invoked automatically when the model's architecture matches DeepSeekOCRForCausalLM or similar DeepSeek-OCR architecture identifiers.

Code Reference

Source Location

  • Repository: Ollama
  • File: convert/convert_deepseekocr.go
  • Lines: 1-136

Signature

type deepseekocr struct {
    ModelParameters
    LanguageConfig struct {
        HiddenLayers       uint32 `json:"num_hidden_layers"`
        NumRoutedExperts   uint32 `json:"n_routed_experts"`
        FirstKDenseReplace uint32 `json:"first_k_dense_replace"`
    } `json:"language_config"`
    VisionConfig struct { ... } `json:"vision_config"`
}

func (m *deepseekocr) KV(t *Tokenizer) KV
func (m *deepseekocr) Tensors(s []Tensor) (out []*ggml.Tensor)
func (m *deepseekocr) Replacements() []string

Import

import "github.com/ollama/ollama/convert"

I/O Contract

Inputs

Name Type Required Description
t *Tokenizer Yes Tokenizer data for GGUF metadata
s []Tensor Yes Source tensors from language, vision, and SAM components

Outputs

Name Type Description
KV KV GGUF metadata for language (MoE), CLIP vision, and SAM encoder params
[]*ggml.Tensor slice Converted tensors with merged MoE experts

Usage Examples

// Converter registered for DeepSeek-OCR architecture
// m := &deepseekocr{}
// json.Unmarshal(configData, m)
// kv := m.KV(tokenizer)
// tensors := m.Tensors(sourceTensors)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment