Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:CarperAI Trlx Checkpoint Conversion

From Leeroopedia


Knowledge Sources
Domains Model_Conversion, NLP, Megatron
Last Updated 2026-02-07 16:00 GMT

Overview

Technique for transforming model weight checkpoints between different framework formats while correctly handling tensor parallelism sharding.

Description

Different training frameworks (HuggingFace Transformers, NeMo/Megatron, DeepSpeed) use different checkpoint formats with different naming conventions and tensor layouts. Checkpoint conversion maps weights between these formats while handling tensor model parallelism (TP), which requires slicing weight matrices along specific dimensions to distribute across GPU ranks. Key challenges include correctly partitioning attention (Q/K/V), MLP, and embedding layers, and generating the target framework's configuration metadata.

Usage

Use this principle when migrating models between training frameworks. Common scenarios include converting HuggingFace models to NeMo format for large-scale distributed training, or converting trained models back to HuggingFace format for inference and deployment.

Theoretical Basis

The conversion follows a weight-mapping protocol:

  1. Name Mapping: Map source parameter names to target naming conventions.
  2. Tensor Partitioning: For TP rank i of N total ranks:

WTPi=W[:,iDN:(i+1)DN]

for column-parallel layers, or the transpose for row-parallel layers.

  1. Config Generation: Generate the target framework's configuration YAML/JSON from source model attributes.

Pseudo-code Logic:

# Abstract algorithm (NOT real implementation)
for tp_rank in range(total_tp):
    nemo_state = {}
    for source_name, target_name in name_mapping:
        weight = source_model[source_name]
        if is_column_parallel(target_name):
            weight = slice_columns(weight, tp_rank, total_tp)
        elif is_row_parallel(target_name):
            weight = slice_rows(weight, tp_rank, total_tp)
        nemo_state[target_name] = weight
    save(nemo_state, f"tp_rank_{tp_rank}/model_weights.pt")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment