Implementation:Ggml org Llama cpp Convert Llama2c To GGML
| Knowledge Sources | |
|---|---|
| Domains | Model_Conversion |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Converts Karpathy's llama2.c model format to GGUF format compatible with llama.cpp's inference engine.
Description
This C++ program defines llama2.c model structures (Config, TransformerWeights) and reads the binary checkpoint format. It maps llama2.c weight names to GGUF tensor names (e.g., token_embd.weight, blk.%d.attn_q.weight). The converter copies vocabulary from an existing GGUF model or llama2.c tokenizer file, and writes all metadata (architecture, context length, head counts, etc.) and tensor data to a new GGUF file using the gguf API. The implementation uses GGUF key constants like KV_GENERAL_ARCHITECTURE, KV_CONTEXT_LENGTH, KV_EMBEDDING_LENGTH, etc.
Usage
Use this tool to enable interoperability between the educational llama2.c project and llama.cpp's production inference engine, allowing small llama2.c-trained models to run with full llama.cpp optimizations.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp
- Lines: 1-941
Signature
// llama2.c Config structure
typedef struct {
int dim; // transformer dimension
int hidden_dim; // for ffn layers
int n_layers; // number of layers
int n_heads; // number of query heads
int n_kv_heads; // number of key/value heads
int vocab_size; // vocabulary size
int seq_len; // max sequence length
} Config;
// GGUF key definitions
#define KV_GENERAL_ARCHITECTURE "general.architecture"
#define KV_CONTEXT_LENGTH "llama.context_length"
#define KV_EMBEDDING_LENGTH "llama.embedding_length"
#define KV_BLOCK_COUNT "llama.block_count"
// GGUF tensor name templates
#define TN_TOKEN_EMBD "token_embd.weight"
#define TN_ATTN_Q "blk.%d.attn_q.weight"
#define TN_FFN_GATE "blk.%d.ffn_gate.weight"
Import
#include "ggml.h"
#include "gguf.h"
#include "llama.h"
#include "common.h"
#include "log.h"
#include <unordered_map>
#include <vector>
#include <cstring>
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| checkpoint_file | file path | Yes | Path to the llama2.c binary checkpoint file |
| tokenizer_source | file path | Yes | Path to existing GGUF model (for vocab) or llama2.c tokenizer.bin |
| output_file | file path | Yes | Desired output path for the GGUF file |
Outputs
| Name | Type | Description |
|---|---|---|
| gguf_file | .gguf file | Converted model with llama architecture metadata, vocabulary, and all transformer weights |
Usage Examples
// Command-line usage:
// ./convert-llama2c-to-ggml --copy-vocab-from-model llama-2-7b.gguf \
// --llama2c-model stories15M.bin \
// --llama2c-output-model stories15M.gguf