Implementation:Ggml org Llama cpp Convert PT To HF
| Knowledge Sources | |
|---|---|
| Domains | Text_To_Speech, Model_Conversion |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Converts WavTokenizer PyTorch checkpoint files to HuggingFace-compatible format (safetensors + config.json) for subsequent GGUF conversion.
Description
This Python script loads a PyTorch checkpoint (.ckpt or .pt), flattens its state dictionary, and renames tensor keys to match expected HuggingFace naming conventions by removing the `state_dict.` prefix and handling specific key patterns (e.g., posnet renaming, backbone layer remapping). It filters tensors to keep only inference-relevant weights (feature_extractor.encodec.quantizer, backbone, head.out), saves them to `model.safetensors` using the safetensors library, generates an `index.json` file mapping tensor names to their shapes and dtypes, and writes a `config.json` with model hyperparameters extracted from the checkpoint.
Usage
Use this script as a prerequisite conversion step in the TTS pipeline to transform WavTokenizer audio decoder weights from PyTorch format into a form that `convert_hf_to_gguf.py` can then process into GGUF format.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: tools/tts/convert_pt_to_hf.py
- Lines: 1-180
Signature
def flatten_state_dict(state_dict, parent_key='', sep='.'):
"""Flatten nested state dict and rename keys for HuggingFace compatibility."""
...
Import
import torch
import json
import os
import sys
import re
from safetensors.torch import save_file
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_path | string (CLI arg or default) | Yes | Path to the PyTorch checkpoint file (.ckpt or .pt), defaults to './model.pt' |
Outputs
| Name | Type | Description |
|---|---|---|
| model.safetensors | file | Flattened and renamed model tensors in safetensors format |
| index.json | file | Mapping of tensor names to shapes and dtypes |
| config.json | file | Model hyperparameters (architecture config) extracted from checkpoint |
Usage Examples
# Convert a WavTokenizer checkpoint to HuggingFace format
python tools/tts/convert_pt_to_hf.py ./wavtokenizer-large-speech-75token.ckpt
# Then convert from HuggingFace to GGUF
python convert_hf_to_gguf.py ./wavtokenizer-large-speech-75token/