Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp Get Evaluation Dataset

From Leeroopedia
Revision as of 12:39, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Ggml_org_Llama_cpp_Get_Evaluation_Dataset.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Aspect Detail
Implementation Name Get Evaluation Dataset
Doc Type External Tool Doc
Domain Model Perplexity Evaluation
Purpose Downloading standard evaluation datasets for perplexity and benchmark scoring
Related Workflow Model_Perplexity_Evaluation

Overview

Description

llama.cpp provides three shell scripts for downloading evaluation datasets:

  • scripts/get-wikitext-2.sh: Downloads the WikiText-2 raw dataset for perplexity computation
  • scripts/get-hellaswag.sh: Downloads the HellaSwag validation dataset for commonsense reasoning evaluation
  • scripts/get-winogrande.sh: Downloads the Winogrande debiased evaluation dataset for coreference resolution

Each script uses wget to fetch the dataset from a hosted URL and prints usage instructions for the llama-perplexity tool.

Usage

Run the appropriate script before beginning evaluation. The downloaded files are placed in the current working directory and must be referenced via the -f flag when running llama-perplexity.

Code Reference

Aspect Detail
Source Location (WikiText-2) scripts/get-wikitext-2.sh:1-12
Source Location (HellaSwag) scripts/get-hellaswag.sh:1-11
Source Location (Winogrande) scripts/get-winogrande.sh:1-11
Signature Shell scripts (no function signature)
Import N/A (standalone scripts)

scripts/get-wikitext-2.sh:

#!/usr/bin/env bash

wget https://huggingface.co/datasets/ggml-org/ci/resolve/main/wikitext-2-raw-v1.zip
unzip wikitext-2-raw-v1.zip

echo "Usage:"
echo ""
echo "  ./llama-perplexity -m model.gguf -f wikitext-2-raw/wiki.test.raw [other params]"
echo ""

exit 0

scripts/get-hellaswag.sh:

#!/usr/bin/env bash

wget https://raw.githubusercontent.com/klosax/hellaswag_text_data/main/hellaswag_val_full.txt

echo "Usage:"
echo ""
echo "  ./llama-perplexity -m model.gguf -f hellaswag_val_full.txt --hellaswag [--hellaswag-tasks N] [other params]"
echo ""

exit 0

scripts/get-winogrande.sh:

#!/usr/bin/env bash

wget https://huggingface.co/datasets/ikawrakow/winogrande-eval-for-llama.cpp/raw/main/winogrande-debiased-eval.csv

echo "Usage:"
echo ""
echo "  ./llama-perplexity -m model.gguf -f winogrande-debiased-eval.csv --winogrande [--winogrande-tasks N] [other params]"
echo ""

exit 0

I/O Contract

get-wikitext-2.sh:

Direction Name Type Description
Input (none) Script takes no arguments
Output wikitext-2-raw-v1.zip File Downloaded ZIP archive
Output wikitext-2-raw/wiki.test.raw File Extracted raw test data (used with -f flag)
Output wikitext-2-raw/wiki.train.raw File Extracted raw training data (optional use)
Output wikitext-2-raw/wiki.valid.raw File Extracted raw validation data (optional use)

get-hellaswag.sh:

Direction Name Type Description
Input (none) Script takes no arguments
Output hellaswag_val_full.txt File HellaSwag validation data (6 lines per task, 10042 tasks)

get-winogrande.sh:

Direction Name Type Description
Input (none) Script takes no arguments
Output winogrande-debiased-eval.csv File Winogrande debiased evaluation data in CSV format

Usage Examples

Example 1: Download and run WikiText-2 perplexity evaluation

# Download the dataset
bash scripts/get-wikitext-2.sh

# Run perplexity evaluation
./llama-perplexity -m models/llama-7b-Q4_K_M.gguf \
    -f wikitext-2-raw/wiki.test.raw \
    --ctx-size 512 \
    --batch-size 512

Example 2: Download and run HellaSwag evaluation

# Download the dataset
bash scripts/get-hellaswag.sh

# Run HellaSwag evaluation with 400 tasks
./llama-perplexity -m models/llama-7b-Q4_K_M.gguf \
    -f hellaswag_val_full.txt \
    --hellaswag \
    --hellaswag-tasks 400

Example 3: Download and run Winogrande evaluation

# Download the dataset
bash scripts/get-winogrande.sh

# Run Winogrande evaluation
./llama-perplexity -m models/llama-7b-Q4_K_M.gguf \
    -f winogrande-debiased-eval.csv \
    --winogrande \
    --winogrande-tasks 1267

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment