Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp Get Evaluation Dataset

From Leeroopedia
Aspect Detail
Implementation Name Get Evaluation Dataset
Doc Type External Tool Doc
Domain Model Perplexity Evaluation
Purpose Downloading standard evaluation datasets for perplexity and benchmark scoring
Related Workflow Model_Perplexity_Evaluation

Overview

Description

llama.cpp provides three shell scripts for downloading evaluation datasets:

  • scripts/get-wikitext-2.sh: Downloads the WikiText-2 raw dataset for perplexity computation
  • scripts/get-hellaswag.sh: Downloads the HellaSwag validation dataset for commonsense reasoning evaluation
  • scripts/get-winogrande.sh: Downloads the Winogrande debiased evaluation dataset for coreference resolution

Each script uses wget to fetch the dataset from a hosted URL and prints usage instructions for the llama-perplexity tool.

Usage

Run the appropriate script before beginning evaluation. The downloaded files are placed in the current working directory and must be referenced via the -f flag when running llama-perplexity.

Code Reference

Aspect Detail
Source Location (WikiText-2) scripts/get-wikitext-2.sh:1-12
Source Location (HellaSwag) scripts/get-hellaswag.sh:1-11
Source Location (Winogrande) scripts/get-winogrande.sh:1-11
Signature Shell scripts (no function signature)
Import N/A (standalone scripts)

scripts/get-wikitext-2.sh:

#!/usr/bin/env bash

wget https://huggingface.co/datasets/ggml-org/ci/resolve/main/wikitext-2-raw-v1.zip
unzip wikitext-2-raw-v1.zip

echo "Usage:"
echo ""
echo "  ./llama-perplexity -m model.gguf -f wikitext-2-raw/wiki.test.raw [other params]"
echo ""

exit 0

scripts/get-hellaswag.sh:

#!/usr/bin/env bash

wget https://raw.githubusercontent.com/klosax/hellaswag_text_data/main/hellaswag_val_full.txt

echo "Usage:"
echo ""
echo "  ./llama-perplexity -m model.gguf -f hellaswag_val_full.txt --hellaswag [--hellaswag-tasks N] [other params]"
echo ""

exit 0

scripts/get-winogrande.sh:

#!/usr/bin/env bash

wget https://huggingface.co/datasets/ikawrakow/winogrande-eval-for-llama.cpp/raw/main/winogrande-debiased-eval.csv

echo "Usage:"
echo ""
echo "  ./llama-perplexity -m model.gguf -f winogrande-debiased-eval.csv --winogrande [--winogrande-tasks N] [other params]"
echo ""

exit 0

I/O Contract

get-wikitext-2.sh:

Direction Name Type Description
Input (none) Script takes no arguments
Output wikitext-2-raw-v1.zip File Downloaded ZIP archive
Output wikitext-2-raw/wiki.test.raw File Extracted raw test data (used with -f flag)
Output wikitext-2-raw/wiki.train.raw File Extracted raw training data (optional use)
Output wikitext-2-raw/wiki.valid.raw File Extracted raw validation data (optional use)

get-hellaswag.sh:

Direction Name Type Description
Input (none) Script takes no arguments
Output hellaswag_val_full.txt File HellaSwag validation data (6 lines per task, 10042 tasks)

get-winogrande.sh:

Direction Name Type Description
Input (none) Script takes no arguments
Output winogrande-debiased-eval.csv File Winogrande debiased evaluation data in CSV format

Usage Examples

Example 1: Download and run WikiText-2 perplexity evaluation

# Download the dataset
bash scripts/get-wikitext-2.sh

# Run perplexity evaluation
./llama-perplexity -m models/llama-7b-Q4_K_M.gguf \
    -f wikitext-2-raw/wiki.test.raw \
    --ctx-size 512 \
    --batch-size 512

Example 2: Download and run HellaSwag evaluation

# Download the dataset
bash scripts/get-hellaswag.sh

# Run HellaSwag evaluation with 400 tasks
./llama-perplexity -m models/llama-7b-Q4_K_M.gguf \
    -f hellaswag_val_full.txt \
    --hellaswag \
    --hellaswag-tasks 400

Example 3: Download and run Winogrande evaluation

# Download the dataset
bash scripts/get-winogrande.sh

# Run Winogrande evaluation
./llama-perplexity -m models/llama-7b-Q4_K_M.gguf \
    -f winogrande-debiased-eval.csv \
    --winogrande \
    --winogrande-tasks 1267

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment