Principle:AnswerDotAI RAGatouille Model Export

Knowledge Sources	Hugging Face Hub Docs ONNX Runtime Vespa Documentation
Domains	Model_Distribution, Deployment, NLP
Last Updated	2026-02-12 12:00 GMT

Overview

Principle governing the serialization and distribution of trained ColBERT models to external platforms for sharing and production deployment.

Description

Model Export is the process of packaging a trained ColBERT model checkpoint and publishing it to an external platform so that it can be consumed by other users or deployed in production search infrastructure. This involves validating the checkpoint integrity, converting weights to the appropriate format, and uploading artifacts to a model registry or deployment target.

Two primary export targets exist in the RAGatouille ecosystem:

Hugging Face Hub: Uploads the full ColBERT checkpoint (weights, config, tokenizer) to a repository on the Hugging Face Hub, enabling other users to load the model via RAGPretrainedModel.from_pretrained().
Vespa ONNX: Converts the ColBERT encoder to ONNX format with dynamic axes for variable-length input, enabling deployment in Vespa search clusters without a PyTorch runtime dependency.

Usage

Use this principle after completing model training or fine-tuning, when the trained ColBERT model needs to be shared with the community or deployed to a production search system. It is the final step that bridges the training workflow with downstream consumption, whether that is other RAGatouille users downloading from the Hub or Vespa serving the model as part of a retrieval pipeline.

Theoretical Basis

Model export follows a two-phase process:

Phase 1 — Validation and Serialization:

Load the ColBERTConfig from the checkpoint to verify it is a valid ColBERT model
Optionally re-save the model to a clean temporary directory to ensure only necessary files are included

Phase 2 — Platform-Specific Conversion and Upload:

Hub Export: Create the target repository, then upload the checkpoint directory as-is
ONNX Export: Wrap the BERT encoder with a linear projection layer, trace the forward pass with dummy inputs, and export via PyTorch ONNX with dynamic axes for batch and sequence dimensions

Pseudo-code:

# Abstract export algorithm
def export_model(checkpoint_path, target):
    config = load_config(checkpoint_path)
    validate(config)

    if target == "hub":
        create_repo(repo_name)
        upload(checkpoint_path, repo_name)

    elif target == "vespa_onnx":
        model = wrap_with_projection(checkpoint_path, dim=128)
        onnx_export(model, dynamic_axes=["batch", "seq_len"])

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment