Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Turboderp org Exllamav2 Dataset Loading

From Leeroopedia
Knowledge Sources
Domains Data_Loading, NLP, Utilities
Last Updated 2026-02-15 00:00 GMT

Overview

Dataset loading for bulk inference involves fetching structured data from external sources, caching it locally, and formatting prompts according to model-specific chat templates.

Description

Bulk inference workflows require processing large numbers of prompts through a language model. The dataset loading pattern addresses two concerns:

Data Acquisition and Caching: Datasets are fetched from HuggingFace's datasets hub using the datasets library. To avoid repeated downloads on subsequent runs, the data is cached locally as JSONL (JSON Lines) files. On each call, the loading function first checks for the cached file and only downloads from HuggingFace if the cache is missing.

This caching strategy provides:

  • Reproducibility - The same cached data is used across runs
  • Offline capability - Once cached, no network access is needed
  • Speed - Local file reads are much faster than API calls

Prompt Formatting: Raw dataset entries (typically containing a question or instruction) must be formatted into the chat template expected by the target model. Different model families use different prompt formats:

  • LLaMA format: Uses [INST] and [/INST] delimiters
  • LLaMA 3 format: Uses <|begin_of_text|> and role-based headers
  • Granite format: Uses <|start_of_role|> delimiters
  • ChatML format: Uses <|im_start|> and <|im_end|> delimiters
  • Gemma format: Uses <start_of_turn> delimiters

The separation of data loading from prompt formatting is a key design decision. It allows the same dataset to be reused across different model configurations simply by changing the format parameter.

Usage

Use dataset loading when performing bulk inference evaluations, benchmarks, or batch processing tasks. The pattern is especially useful for running the same set of prompts through different models or model configurations for comparison.

Theoretical Basis

Dataset Loading Pipeline:

1. CHECK CACHE:
   - Look for local JSONL file at: data/{ds_name}_{category}_{split}.jsonl
   - If exists: load and return

2. DOWNLOAD:
   - Call datasets.load_dataset(ds_name, category, split=split)
   - Convert to list of dicts
   - Write to JSONL cache file

3. RETURN:
   - List of dicts, each representing one dataset row

Prompt Formatting Pipeline:

1. SELECT FORMAT:
   - Match prompt_format string to template function

2. APPLY TEMPLATE:
   - Insert system prompt (sp) and user prompt (p) into format
   - Return formatted string ready for tokenization

Example (ChatML format):
   Input:  sp="You are helpful.", p="What is 2+2?"
   Output: "<|im_start|>system\nYou are helpful.<|im_end|>\n
            <|im_start|>user\nWhat is 2+2?<|im_end|>\n
            <|im_start|>assistant\n"

The JSONL caching format stores one JSON object per line, making it efficient for both sequential reading and appending. Each line represents a complete dataset row that can be parsed independently.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment