Implementation:Ollama Ollama Llama Model SmolLM3
| Knowledge Sources | |
|---|---|
| Domains | LLM Inference, Model Architecture |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Implements the ggml computation graph builder for the SmolLM3 model architecture.
Description
The llm_build_smollm3 constructor builds a transformer with conditional RoPE application -- some layers skip RoPE based on n_no_rope_layer_step. Features configurable attention scaling via hparams.f_attention_scale, RMS-normalized self-attention with Q/K/V projections and optional biases, and SiLU-gated feed-forward layers across all transformer blocks.
Usage
Enables Ollama to run SmolLM3 models through the llama.cpp inference engine, supporting its unique architecture feature where certain layers operate without rotary position embeddings.
Code Reference
Source Location
- Repository: Ollama
- File:
llama/llama.cpp/src/models/smollm3.cpp - Lines: 1-128
Signature
llm_build_smollm3::llm_build_smollm3(
const llama_model & model,
const llm_graph_params & params) : llm_graph_context(params);
Import
#include "models.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | const llama_model & | Yes | Loaded model with SmolLM3 weights |
| params | const llm_graph_params & | Yes | Graph construction parameters |
Outputs
| Name | Type | Description |
|---|---|---|
| ggml graph | ggml_cgraph | Complete SmolLM3 computation graph with conditional RoPE |
Usage Examples
auto builder = llm_build_smollm3(model, params);