Principle:Unslothai Unsloth Ollama Deployment
| Knowledge Sources | |
|---|---|
| Domains | Model_Deployment, Inference |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
A deployment configuration technique that generates Ollama Modelfile templates matching the correct chat format for 50+ model families to enable local inference via Ollama.
Description
Ollama is a popular tool for running LLMs locally. It requires a Modelfile that specifies the GGUF model path, chat template format, and generation parameters. Each model family (Llama 3, Mistral, ChatML, Gemma, Qwen, Phi, etc.) requires a different template format.
Unsloth maintains a comprehensive registry mapping model names to Ollama-compatible templates, ensuring that exported GGUF models work correctly with Ollama out of the box. The template includes:
- FROM: Path to the GGUF file
- TEMPLATE: Go template string defining the chat format
- PARAMETER: Generation parameters (temperature, stop tokens)
- SYSTEM: Default system prompt
Usage
This principle is automatically applied during GGUF export (save_pretrained_gguf and push_to_hub_gguf). A Modelfile is generated alongside the GGUF file. Can also be used standalone to generate Ollama templates for existing models.
Theoretical Basis
Ollama template generation is a lookup-and-substitution process:
# Abstract Ollama template generation
template_key = MODEL_TO_OLLAMA_TEMPLATE_MAPPER[model_name]
modelfile = OLLAMA_TEMPLATES[template_key]
modelfile = modelfile.replace("{__FILE_LOCATION__}", gguf_path)
modelfile = modelfile.replace("{__EOS_TOKEN__}", eos_token)
The critical constraint is that the Ollama template must exactly match the model's training chat format, otherwise the model will produce degraded output due to template mismatch.