Principle:EvolvingLMMs Lab Lmms eval Few Shot Sampling

Knowledge Sources	EvolvingLMMs_Lab_Lmms_eval
Domains	Evaluation, In-Context Learning
Last Updated	2026-02-14 00:00 GMT

Overview

Few-Shot Sampling defines how the evaluation framework selects and formats example demonstrations to provide as context when evaluating language models. This principle establishes strategies for sampling few-shot examples from a pool of documents, constructing labeled demonstrations, and managing context to guide model performance on evaluation tasks.

Theoretical Basis

Context Sampling

Few-shot sampling involves selecting example input-output pairs from a dataset to prepend to the actual test instance:

Provides the model with in-context learning examples
Demonstrates the expected input format and output style
Can improve model performance on challenging tasks
Must avoid including the test instance itself in the few-shot examples

Sampling Strategies

Different strategies for selecting few-shot examples:

Random Sampling: Randomly select N examples from the pool
First-N Sampling: Take the first N examples in deterministic order
Balanced Sampling: Select examples to balance across classes/categories
Manual Sampling: User-specified examples

Context Construction

Building the few-shot prompt string:

Convert document to text representation using task's doc_to_text
Convert document to target/answer using task's doc_to_target
Join with delimiters (fewshot_delimiter, target_delimiter)
Handle both string and choice-based formats
Ensure consistent formatting across examples

Split Management

Handling few-shot examples from different data splits:

Can use separate fewshot_split from test_split
If using same split, must filter out the test document
Sample extra example when using same split to ensure N examples after filtering

Design Patterns

Sampler Configuration

num_fewshot: Number of few-shot examples to include
fewshot_split: Dataset split to sample from (train, validation, etc.)
fewshot_delimiter: String separator between examples (e.g., "\n\n")
target_delimiter: String between question and answer (e.g., "\nAnswer: ")
fewshot_indices: Optional explicit list of example indices

Sampler Types

ContextSampler: Base sampler with random selection
FirstNSampler: Deterministic ordering for canonical examples
BalancedSampler: Class-balanced selection (TODO: not implemented)
ManualSampler: User-specified examples (TODO: not implemented)

Task Integration

Samplers use task methods: doc_to_text, doc_to_target, doc_to_choice
Access task configuration for delimiters and split settings
Receive random number generator for reproducibility

Determinism

Use seeded random number generator for reproducibility
FirstNSampler provides deterministic ordering for benchmarks
Consistent sampling across evaluation runs

Context Length

Balance number of examples with model context window
Consider token limits when setting num_fewshot
Longer examples may require fewer shots

Quality of Examples

First-N strategy useful for curated, canonical examples (MMLU, CMMLU)
Balanced sampling helps with imbalanced datasets
Manual selection allows task-specific expert curation

Format Consistency

Examples must match test instance format exactly
Handle both string and multiple-choice formats
Preserve spacing and delimiter conventions

Usage Examples

Initializing a Sampler

# Initialize sampler
sampler = FirstNSampler(
    docs=fewshot_docs,  # HF dataset split
    task=task_instance,
    fewshot_indices=None,  # Use all docs
    rnd=random.Random(42)  # Seeded RNG
)

# Get context for a test document
context = sampler.get_context(
    doc=test_doc,
    num_fewshot=5
)

# Result is formatted string like:
# "Question 1\nAnswer: A\n\nQuestion 2\nAnswer: B\n\n..."

Task YAML Configuration

# In task YAML
num_fewshot: 5
fewshot_split: train
fewshot_delimiter: "\n\n"
target_delimiter: "\nAnswer: "
fewshot_config:
  sampler: first_n  # or "default" for random

Best Practices

Use FirstNSampler for benchmarks with canonical examples
Set num_fewshot based on model context limits
Ensure fewshot_split has sufficient examples
Use separate train split when available for few-shot
Test formatting with edge cases (long text, special chars)
Seed random generator for reproducible results
Document expected format in task configuration
Consider task difficulty when setting shot count

Common Patterns

MMLU-Style Tasks

Use FirstNSampler for deterministic ordering
5-shot is standard for MMLU
Examples from development/validation set
Multiple choice format with letter answers

Generative Tasks

Random sampling often sufficient
Fewer shots may be needed (2-3)
Target can be free-form text
Consider example diversity

Zero-Shot Evaluation

Set num_fewshot=0
Only system prompt provided
Tests model's base knowledge
Useful for instruction-tuned models

Related Pages

Implementations

EvolvingLMMs_Lab_Lmms_eval_Context_Samplers — implementation of sampling strategies
Implementation:EvolvingLMMs_Lab_Lmms_eval_Context_Samplers

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment