Principle:CarperAI Trlx Checkpoint Conversion

Knowledge Sources	NeMo Checkpoint Guide
Domains	Model_Conversion, NLP, Megatron
Last Updated	2026-02-07 16:00 GMT

Overview

Technique for transforming model weight checkpoints between different framework formats while correctly handling tensor parallelism sharding.

Description

Different training frameworks (HuggingFace Transformers, NeMo/Megatron, DeepSpeed) use different checkpoint formats with different naming conventions and tensor layouts. Checkpoint conversion maps weights between these formats while handling tensor model parallelism (TP), which requires slicing weight matrices along specific dimensions to distribute across GPU ranks. Key challenges include correctly partitioning attention (Q/K/V), MLP, and embedding layers, and generating the target framework's configuration metadata.

Usage

Use this principle when migrating models between training frameworks. Common scenarios include converting HuggingFace models to NeMo format for large-scale distributed training, or converting trained models back to HuggingFace format for inference and deployment.

Theoretical Basis

The conversion follows a weight-mapping protocol:

Name Mapping: Map source parameter names to target naming conventions.
Tensor Partitioning: For TP rank $i$ of $N$ total ranks:

$W_{{TP}_{i}} = W [:, \frac{i \cdot D}{N} : \frac{(i + 1) \cdot D}{N}]$

for column-parallel layers, or the transpose for row-parallel layers.

Config Generation: Generate the target framework's configuration YAML/JSON from source model attributes.

Pseudo-code Logic:

# Abstract algorithm (NOT real implementation)
for tp_rank in range(total_tp):
    nemo_state = {}
    for source_name, target_name in name_mapping:
        weight = source_model[source_name]
        if is_column_parallel(target_name):
            weight = slice_columns(weight, tp_rank, total_tp)
        elif is_row_parallel(target_name):
            weight = slice_rows(weight, tp_rank, total_tp)
        nemo_state[target_name] = weight
    save(nemo_state, f"tp_rank_{tp_rank}/model_weights.pt")

Related Pages

Implementation:CarperAI_Trlx_Convert_LLaMA_To_NeMo

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment