Implementation:Ggml org Llama cpp Get Evaluation Dataset

Aspect	Detail
Implementation Name	Get Evaluation Dataset
Doc Type	External Tool Doc
Domain	Model Perplexity Evaluation
Purpose	Downloading standard evaluation datasets for perplexity and benchmark scoring
Related Workflow	Model_Perplexity_Evaluation

Overview

Description

llama.cpp provides three shell scripts for downloading evaluation datasets:

scripts/get-wikitext-2.sh: Downloads the WikiText-2 raw dataset for perplexity computation
scripts/get-hellaswag.sh: Downloads the HellaSwag validation dataset for commonsense reasoning evaluation
scripts/get-winogrande.sh: Downloads the Winogrande debiased evaluation dataset for coreference resolution

Each script uses wget to fetch the dataset from a hosted URL and prints usage instructions for the llama-perplexity tool.

Usage

Run the appropriate script before beginning evaluation. The downloaded files are placed in the current working directory and must be referenced via the -f flag when running llama-perplexity.

Code Reference

Aspect	Detail
Source Location (WikiText-2)	`scripts/get-wikitext-2.sh:1-12`
Source Location (HellaSwag)	`scripts/get-hellaswag.sh:1-11`
Source Location (Winogrande)	`scripts/get-winogrande.sh:1-11`
Signature	Shell scripts (no function signature)
Import	N/A (standalone scripts)

scripts/get-wikitext-2.sh:

#!/usr/bin/env bash

wget https://huggingface.co/datasets/ggml-org/ci/resolve/main/wikitext-2-raw-v1.zip
unzip wikitext-2-raw-v1.zip

echo "Usage:"
echo ""
echo "  ./llama-perplexity -m model.gguf -f wikitext-2-raw/wiki.test.raw [other params]"
echo ""

exit 0

scripts/get-hellaswag.sh:

#!/usr/bin/env bash

wget https://raw.githubusercontent.com/klosax/hellaswag_text_data/main/hellaswag_val_full.txt

echo "Usage:"
echo ""
echo "  ./llama-perplexity -m model.gguf -f hellaswag_val_full.txt --hellaswag [--hellaswag-tasks N] [other params]"
echo ""

exit 0

scripts/get-winogrande.sh:

#!/usr/bin/env bash

wget https://huggingface.co/datasets/ikawrakow/winogrande-eval-for-llama.cpp/raw/main/winogrande-debiased-eval.csv

echo "Usage:"
echo ""
echo "  ./llama-perplexity -m model.gguf -f winogrande-debiased-eval.csv --winogrande [--winogrande-tasks N] [other params]"
echo ""

exit 0

I/O Contract

get-wikitext-2.sh:

Direction	Name	Type	Description
Input	(none)		Script takes no arguments
Output	`wikitext-2-raw-v1.zip`	File	Downloaded ZIP archive
Output	`wikitext-2-raw/wiki.test.raw`	File	Extracted raw test data (used with `-f` flag)
Output	`wikitext-2-raw/wiki.train.raw`	File	Extracted raw training data (optional use)
Output	`wikitext-2-raw/wiki.valid.raw`	File	Extracted raw validation data (optional use)

get-hellaswag.sh:

Direction	Name	Type	Description
Input	(none)		Script takes no arguments
Output	`hellaswag_val_full.txt`	File	HellaSwag validation data (6 lines per task, 10042 tasks)

get-winogrande.sh:

Direction	Name	Type	Description
Input	(none)		Script takes no arguments
Output	`winogrande-debiased-eval.csv`	File	Winogrande debiased evaluation data in CSV format

Usage Examples

Example 1: Download and run WikiText-2 perplexity evaluation

# Download the dataset
bash scripts/get-wikitext-2.sh

# Run perplexity evaluation
./llama-perplexity -m models/llama-7b-Q4_K_M.gguf \
    -f wikitext-2-raw/wiki.test.raw \
    --ctx-size 512 \
    --batch-size 512

Example 2: Download and run HellaSwag evaluation

# Download the dataset
bash scripts/get-hellaswag.sh

# Run HellaSwag evaluation with 400 tasks
./llama-perplexity -m models/llama-7b-Q4_K_M.gguf \
    -f hellaswag_val_full.txt \
    --hellaswag \
    --hellaswag-tasks 400

Example 3: Download and run Winogrande evaluation

# Download the dataset
bash scripts/get-winogrande.sh

# Run Winogrande evaluation
./llama-perplexity -m models/llama-7b-Q4_K_M.gguf \
    -f winogrande-debiased-eval.csv \
    --winogrande \
    --winogrande-tasks 1267

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment