Implementation:Ollama Ollama Llama Model T5 Encoder
| Knowledge Sources | |
|---|---|
| Domains | LLM Inference, Model Architecture |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Implements the ggml computation graph builder for the T5 encoder architecture, fundamentally different from the decoder-only models that dominate the codebase.
Description
The llm_build_t5_enc constructor builds a T5 encoder graph that uses relative position bias (via bucketed position embeddings from build_inp_pos_bucket_enc) instead of RoPE, no-cache attention (since the encoder processes the full input at once via build_attn_inp_no_cache), encoder-specific weight matrices (wq_enc, wk_enc, wv_enc, wo_enc), and RMS normalization. Produces encoder output embeddings rather than token logits.
Usage
Enables Ollama to run T5 encoder-decoder models by implementing the encoder half of the T5 architecture.
Code Reference
Source Location
- Repository: Ollama
- File:
llama/llama.cpp/src/models/t5-enc.cpp - Lines: 1-96
Signature
llm_build_t5_enc::llm_build_t5_enc(
const llama_model & model,
const llm_graph_params & params) : llm_graph_context(params);
Import
#include "models.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | const llama_model & | Yes | Loaded model with T5 encoder weights |
| params | const llm_graph_params & | Yes | Graph construction parameters |
Outputs
| Name | Type | Description |
|---|---|---|
| ggml graph | ggml_cgraph | T5 encoder computation graph producing embeddings |
Usage Examples
auto builder = llm_build_t5_enc(model, params);
// Produces encoder embeddings, not token logits