Implementation:Ggml org Llama cpp Get Evaluation Dataset
| Aspect | Detail |
|---|---|
| Implementation Name | Get Evaluation Dataset |
| Doc Type | External Tool Doc |
| Domain | Model Perplexity Evaluation |
| Purpose | Downloading standard evaluation datasets for perplexity and benchmark scoring |
| Related Workflow | Model_Perplexity_Evaluation |
Overview
Description
llama.cpp provides three shell scripts for downloading evaluation datasets:
scripts/get-wikitext-2.sh: Downloads the WikiText-2 raw dataset for perplexity computationscripts/get-hellaswag.sh: Downloads the HellaSwag validation dataset for commonsense reasoning evaluationscripts/get-winogrande.sh: Downloads the Winogrande debiased evaluation dataset for coreference resolution
Each script uses wget to fetch the dataset from a hosted URL and prints usage instructions for the llama-perplexity tool.
Usage
Run the appropriate script before beginning evaluation. The downloaded files are placed in the current working directory and must be referenced via the -f flag when running llama-perplexity.
Code Reference
| Aspect | Detail |
|---|---|
| Source Location (WikiText-2) | scripts/get-wikitext-2.sh:1-12
|
| Source Location (HellaSwag) | scripts/get-hellaswag.sh:1-11
|
| Source Location (Winogrande) | scripts/get-winogrande.sh:1-11
|
| Signature | Shell scripts (no function signature) |
| Import | N/A (standalone scripts) |
scripts/get-wikitext-2.sh:
#!/usr/bin/env bash
wget https://huggingface.co/datasets/ggml-org/ci/resolve/main/wikitext-2-raw-v1.zip
unzip wikitext-2-raw-v1.zip
echo "Usage:"
echo ""
echo " ./llama-perplexity -m model.gguf -f wikitext-2-raw/wiki.test.raw [other params]"
echo ""
exit 0
scripts/get-hellaswag.sh:
#!/usr/bin/env bash
wget https://raw.githubusercontent.com/klosax/hellaswag_text_data/main/hellaswag_val_full.txt
echo "Usage:"
echo ""
echo " ./llama-perplexity -m model.gguf -f hellaswag_val_full.txt --hellaswag [--hellaswag-tasks N] [other params]"
echo ""
exit 0
scripts/get-winogrande.sh:
#!/usr/bin/env bash
wget https://huggingface.co/datasets/ikawrakow/winogrande-eval-for-llama.cpp/raw/main/winogrande-debiased-eval.csv
echo "Usage:"
echo ""
echo " ./llama-perplexity -m model.gguf -f winogrande-debiased-eval.csv --winogrande [--winogrande-tasks N] [other params]"
echo ""
exit 0
I/O Contract
get-wikitext-2.sh:
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | (none) | Script takes no arguments | |
| Output | wikitext-2-raw-v1.zip |
File | Downloaded ZIP archive |
| Output | wikitext-2-raw/wiki.test.raw |
File | Extracted raw test data (used with -f flag)
|
| Output | wikitext-2-raw/wiki.train.raw |
File | Extracted raw training data (optional use) |
| Output | wikitext-2-raw/wiki.valid.raw |
File | Extracted raw validation data (optional use) |
get-hellaswag.sh:
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | (none) | Script takes no arguments | |
| Output | hellaswag_val_full.txt |
File | HellaSwag validation data (6 lines per task, 10042 tasks) |
get-winogrande.sh:
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | (none) | Script takes no arguments | |
| Output | winogrande-debiased-eval.csv |
File | Winogrande debiased evaluation data in CSV format |
Usage Examples
Example 1: Download and run WikiText-2 perplexity evaluation
# Download the dataset
bash scripts/get-wikitext-2.sh
# Run perplexity evaluation
./llama-perplexity -m models/llama-7b-Q4_K_M.gguf \
-f wikitext-2-raw/wiki.test.raw \
--ctx-size 512 \
--batch-size 512
Example 2: Download and run HellaSwag evaluation
# Download the dataset
bash scripts/get-hellaswag.sh
# Run HellaSwag evaluation with 400 tasks
./llama-perplexity -m models/llama-7b-Q4_K_M.gguf \
-f hellaswag_val_full.txt \
--hellaswag \
--hellaswag-tasks 400
Example 3: Download and run Winogrande evaluation
# Download the dataset
bash scripts/get-winogrande.sh
# Run Winogrande evaluation
./llama-perplexity -m models/llama-7b-Q4_K_M.gguf \
-f winogrande-debiased-eval.csv \
--winogrande \
--winogrande-tasks 1267