Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:EvolvingLMMs Lab Lmms eval Few Shot Sampling

From Leeroopedia
Knowledge Sources
Domains Evaluation, In-Context Learning
Last Updated 2026-02-14 00:00 GMT

Overview

Few-Shot Sampling defines how the evaluation framework selects and formats example demonstrations to provide as context when evaluating language models. This principle establishes strategies for sampling few-shot examples from a pool of documents, constructing labeled demonstrations, and managing context to guide model performance on evaluation tasks.

Theoretical Basis

Context Sampling

Few-shot sampling involves selecting example input-output pairs from a dataset to prepend to the actual test instance:

  • Provides the model with in-context learning examples
  • Demonstrates the expected input format and output style
  • Can improve model performance on challenging tasks
  • Must avoid including the test instance itself in the few-shot examples

Sampling Strategies

Different strategies for selecting few-shot examples:

  • Random Sampling: Randomly select N examples from the pool
  • First-N Sampling: Take the first N examples in deterministic order
  • Balanced Sampling: Select examples to balance across classes/categories
  • Manual Sampling: User-specified examples

Context Construction

Building the few-shot prompt string:

  • Convert document to text representation using task's doc_to_text
  • Convert document to target/answer using task's doc_to_target
  • Join with delimiters (fewshot_delimiter, target_delimiter)
  • Handle both string and choice-based formats
  • Ensure consistent formatting across examples

Split Management

Handling few-shot examples from different data splits:

  • Can use separate fewshot_split from test_split
  • If using same split, must filter out the test document
  • Sample extra example when using same split to ensure N examples after filtering

Design Patterns

Sampler Configuration

  • num_fewshot: Number of few-shot examples to include
  • fewshot_split: Dataset split to sample from (train, validation, etc.)
  • fewshot_delimiter: String separator between examples (e.g., "\n\n")
  • target_delimiter: String between question and answer (e.g., "\nAnswer: ")
  • fewshot_indices: Optional explicit list of example indices

Sampler Types

  • ContextSampler: Base sampler with random selection
  • FirstNSampler: Deterministic ordering for canonical examples
  • BalancedSampler: Class-balanced selection (TODO: not implemented)
  • ManualSampler: User-specified examples (TODO: not implemented)

Task Integration

  • Samplers use task methods: doc_to_text, doc_to_target, doc_to_choice
  • Access task configuration for delimiters and split settings
  • Receive random number generator for reproducibility

Determinism

  • Use seeded random number generator for reproducibility
  • FirstNSampler provides deterministic ordering for benchmarks
  • Consistent sampling across evaluation runs

Context Length

  • Balance number of examples with model context window
  • Consider token limits when setting num_fewshot
  • Longer examples may require fewer shots

Quality of Examples

  • First-N strategy useful for curated, canonical examples (MMLU, CMMLU)
  • Balanced sampling helps with imbalanced datasets
  • Manual selection allows task-specific expert curation

Format Consistency

  • Examples must match test instance format exactly
  • Handle both string and multiple-choice formats
  • Preserve spacing and delimiter conventions

Usage Examples

Initializing a Sampler

# Initialize sampler
sampler = FirstNSampler(
    docs=fewshot_docs,  # HF dataset split
    task=task_instance,
    fewshot_indices=None,  # Use all docs
    rnd=random.Random(42)  # Seeded RNG
)

# Get context for a test document
context = sampler.get_context(
    doc=test_doc,
    num_fewshot=5
)

# Result is formatted string like:
# "Question 1\nAnswer: A\n\nQuestion 2\nAnswer: B\n\n..."

Task YAML Configuration

# In task YAML
num_fewshot: 5
fewshot_split: train
fewshot_delimiter: "\n\n"
target_delimiter: "\nAnswer: "
fewshot_config:
  sampler: first_n  # or "default" for random

Best Practices

  • Use FirstNSampler for benchmarks with canonical examples
  • Set num_fewshot based on model context limits
  • Ensure fewshot_split has sufficient examples
  • Use separate train split when available for few-shot
  • Test formatting with edge cases (long text, special chars)
  • Seed random generator for reproducible results
  • Document expected format in task configuration
  • Consider task difficulty when setting shot count

Common Patterns

MMLU-Style Tasks

  • Use FirstNSampler for deterministic ordering
  • 5-shot is standard for MMLU
  • Examples from development/validation set
  • Multiple choice format with letter answers

Generative Tasks

  • Random sampling often sufficient
  • Fewer shots may be needed (2-3)
  • Target can be free-form text
  • Consider example diversity

Zero-Shot Evaluation

  • Set num_fewshot=0
  • Only system prompt provided
  • Tests model's base knowledge
  • Useful for instruction-tuned models

Related Pages

Implementations

See Also

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment