Implementation:Ollama Ollama Convert DeepSeekOcr
| Knowledge Sources | |
|---|---|
| Domains | Model Conversion, GGUF Format |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Implements the GGUF model converter for the DeepSeek-OCR multimodal architecture, handling a language model with MoE experts, a CLIP vision encoder, and a SAM-based vision backbone.
Description
The deepseekocr struct implements ModelConverter for converting DeepSeek-OCR models that combine a DeepSeek-style language model with dual vision encoders (CLIP-L and SAM ViT-B). The KV method emits metadata for the language model (block count, attention heads, MoE expert parameters), the CLIP vision encoder (layers, width, heads, image/patch size), and the SAM encoder (layers, width, heads, global attention indexes). The Tensors method merges per-expert gate, up, and down projection tensors into consolidated expert weight matrices. Tensor name replacements handle the model.vision_model, model.projector, and model.sam_model namespaces.
Usage
Invoked automatically when the model's architecture matches DeepSeekOCRForCausalLM or similar DeepSeek-OCR architecture identifiers.
Code Reference
Source Location
- Repository: Ollama
- File: convert/convert_deepseekocr.go
- Lines: 1-136
Signature
type deepseekocr struct {
ModelParameters
LanguageConfig struct {
HiddenLayers uint32 `json:"num_hidden_layers"`
NumRoutedExperts uint32 `json:"n_routed_experts"`
FirstKDenseReplace uint32 `json:"first_k_dense_replace"`
} `json:"language_config"`
VisionConfig struct { ... } `json:"vision_config"`
}
func (m *deepseekocr) KV(t *Tokenizer) KV
func (m *deepseekocr) Tensors(s []Tensor) (out []*ggml.Tensor)
func (m *deepseekocr) Replacements() []string
Import
import "github.com/ollama/ollama/convert"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| t | *Tokenizer | Yes | Tokenizer data for GGUF metadata |
| s | []Tensor | Yes | Source tensors from language, vision, and SAM components |
Outputs
| Name | Type | Description |
|---|---|---|
| KV | KV | GGUF metadata for language (MoE), CLIP vision, and SAM encoder params |
| []*ggml.Tensor | slice | Converted tensors with merged MoE experts |
Usage Examples
// Converter registered for DeepSeek-OCR architecture
// m := &deepseekocr{}
// json.Unmarshal(configData, m)
// kv := m.KV(tokenizer)
// tensors := m.Tensors(sourceTensors)