Implementation:Turboderp org Exllamav2 Get Dataset

Knowledge Sources	ExLlamaV2
Domains	Data_Loading, NLP, Utilities
Last Updated	2026-02-15 00:00 GMT

Overview

Concrete tool for loading datasets from HuggingFace with local JSONL caching and formatting prompts with model-specific chat templates, provided by exllamav2's example utilities.

Description

The get_dataset() function wraps HuggingFace's datasets.load_dataset() with a local caching layer. It first checks for a cached JSONL file at a conventional path; if found, it reads from cache. Otherwise, it downloads the dataset from HuggingFace, converts it to a list of dicts, and writes a JSONL cache file for future use.

The format_prompt() function formats a system prompt and user prompt into a model-specific chat template string. It supports several common formats:

"llama" - LLaMA/LLaMA 2 instruction format with [INST] tags
"llama3" - LLaMA 3 format with role-based headers
"granite" - IBM Granite format with <|start_of_role|> tags
"chatml" - ChatML format with <|im_start|>/<|im_end|> tags
"gemma" - Google Gemma format with <start_of_turn> tags

These functions are part of the example utilities and are not part of the installable exllamav2 package. They are designed to be copied or adapted into user code.

Usage

Use these utilities when running bulk inference benchmarks or evaluation scripts from the exllamav2 examples directory. For production use, adapt the caching and formatting patterns into your own codebase.

Code Reference

Source Location

Repository: exllamav2
File: examples/util.py
Lines: L51-72 (get_dataset), L4-37 (format_prompt)

Signature

def get_dataset(
    ds_name: str,
    category: str,
    split: str
) -> list:
    ...

def format_prompt(
    prompt_format: str,
    sp: str,
    p: str
) -> str:
    ...

Import

# From examples/util.py (not installable; copy pattern into your code)
from util import get_dataset, format_prompt

I/O Contract

Inputs (get_dataset)

Name	Type	Required	Description
ds_name	str	Yes	HuggingFace dataset name (e.g., "cais/mmlu", "gsm8k")
category	str	Yes	Dataset configuration/subset name, or None for datasets without subsets
split	str	Yes	Dataset split to load (e.g., "test", "train", "validation")

Outputs (get_dataset)

Name	Type	Description
dataset	list	List of dictionaries, each representing one row from the dataset. Cached locally as a JSONL file at data/{ds_name}_{category}_{split}.jsonl

Inputs (format_prompt)

Name	Type	Required	Description
prompt_format	str	Yes	One of "llama", "llama3", "granite", "chatml", "gemma" specifying the chat template to use
sp	str	Yes	System prompt text
p	str	Yes	User prompt text

Outputs (format_prompt)

Name	Type	Description
formatted_prompt	str	Fully formatted prompt string ready for tokenization, with system and user content inserted into the appropriate chat template

Dependencies

datasets (HuggingFace) - For downloading datasets from the HuggingFace hub
json - For JSONL serialization and deserialization
os - For file path operations and cache file existence checks

Usage Examples

Basic Dataset Loading

from util import get_dataset, format_prompt

# Load MMLU test set (anatomy subset)
dataset = get_dataset("cais/mmlu", "anatomy", "test")
print(f"Loaded {len(dataset)} examples")
print(dataset[0])  # First example as a dict

Formatting Prompts for Bulk Inference

# Format each dataset entry for a ChatML-compatible model
system_prompt = "You are a helpful assistant. Answer the question concisely."

formatted_prompts = []
for row in dataset:
    question = row["question"]
    prompt = format_prompt("chatml", system_prompt, question)
    formatted_prompts.append(prompt)

Complete Bulk Inference Pipeline

from util import get_dataset, format_prompt
from exllamav2 import ExLlamaV2, ExLlamaV2Config, ExLlamaV2Tokenizer, ExLlamaV2Cache
from exllamav2.generator import ExLlamaV2DynamicGenerator, ExLlamaV2DynamicJob
from exllamav2 import ExLlamaV2Sampler

# Load model
config = ExLlamaV2Config(model_dir)
model = ExLlamaV2(config)
model.load()
tokenizer = ExLlamaV2Tokenizer(config)
cache = ExLlamaV2Cache(model, max_seq_len=4096)
generator = ExLlamaV2DynamicGenerator(model=model, cache=cache, tokenizer=tokenizer)

# Load dataset
dataset = get_dataset("cais/mmlu", "anatomy", "test")

# Enqueue all prompts as jobs
gen_settings = ExLlamaV2Sampler.Settings(temperature=0.1)
for i, row in enumerate(dataset):
    prompt = format_prompt("chatml", "Answer concisely.", row["question"])
    input_ids = tokenizer.encode(prompt)
    job = ExLlamaV2DynamicJob(
        input_ids=input_ids,
        max_new_tokens=200,
        gen_settings=gen_settings,
        stop_conditions=[tokenizer.eos_token_id],
        identifier=i
    )
    generator.enqueue(job)

# Collect results
results = {}
while generator.num_remaining_jobs() > 0:
    for result in generator.iterate():
        if result["eos"]:
            results[result["identifier"]] = result["full_completion"]

Related Pages

Implements Principle

Principle:Turboderp_org_Exllamav2_Dataset_Loading

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment