Principle:EvolvingLMMs Lab Lmms eval Few Shot Sampling
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, In-Context Learning |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
Few-Shot Sampling defines how the evaluation framework selects and formats example demonstrations to provide as context when evaluating language models. This principle establishes strategies for sampling few-shot examples from a pool of documents, constructing labeled demonstrations, and managing context to guide model performance on evaluation tasks.
Theoretical Basis
Context Sampling
Few-shot sampling involves selecting example input-output pairs from a dataset to prepend to the actual test instance:
- Provides the model with in-context learning examples
- Demonstrates the expected input format and output style
- Can improve model performance on challenging tasks
- Must avoid including the test instance itself in the few-shot examples
Sampling Strategies
Different strategies for selecting few-shot examples:
- Random Sampling: Randomly select N examples from the pool
- First-N Sampling: Take the first N examples in deterministic order
- Balanced Sampling: Select examples to balance across classes/categories
- Manual Sampling: User-specified examples
Context Construction
Building the few-shot prompt string:
- Convert document to text representation using task's
doc_to_text - Convert document to target/answer using task's
doc_to_target - Join with delimiters (
fewshot_delimiter,target_delimiter) - Handle both string and choice-based formats
- Ensure consistent formatting across examples
Split Management
Handling few-shot examples from different data splits:
- Can use separate
fewshot_splitfromtest_split - If using same split, must filter out the test document
- Sample extra example when using same split to ensure N examples after filtering
Design Patterns
Sampler Configuration
- num_fewshot: Number of few-shot examples to include
- fewshot_split: Dataset split to sample from (train, validation, etc.)
- fewshot_delimiter: String separator between examples (e.g.,
"\n\n") - target_delimiter: String between question and answer (e.g.,
"\nAnswer: ") - fewshot_indices: Optional explicit list of example indices
Sampler Types
- ContextSampler: Base sampler with random selection
- FirstNSampler: Deterministic ordering for canonical examples
- BalancedSampler: Class-balanced selection (TODO: not implemented)
- ManualSampler: User-specified examples (TODO: not implemented)
Task Integration
- Samplers use task methods:
doc_to_text,doc_to_target,doc_to_choice - Access task configuration for delimiters and split settings
- Receive random number generator for reproducibility
Determinism
- Use seeded random number generator for reproducibility
FirstNSamplerprovides deterministic ordering for benchmarks- Consistent sampling across evaluation runs
Context Length
- Balance number of examples with model context window
- Consider token limits when setting
num_fewshot - Longer examples may require fewer shots
Quality of Examples
- First-N strategy useful for curated, canonical examples (MMLU, CMMLU)
- Balanced sampling helps with imbalanced datasets
- Manual selection allows task-specific expert curation
Format Consistency
- Examples must match test instance format exactly
- Handle both string and multiple-choice formats
- Preserve spacing and delimiter conventions
Usage Examples
Initializing a Sampler
# Initialize sampler
sampler = FirstNSampler(
docs=fewshot_docs, # HF dataset split
task=task_instance,
fewshot_indices=None, # Use all docs
rnd=random.Random(42) # Seeded RNG
)
# Get context for a test document
context = sampler.get_context(
doc=test_doc,
num_fewshot=5
)
# Result is formatted string like:
# "Question 1\nAnswer: A\n\nQuestion 2\nAnswer: B\n\n..."
Task YAML Configuration
# In task YAML
num_fewshot: 5
fewshot_split: train
fewshot_delimiter: "\n\n"
target_delimiter: "\nAnswer: "
fewshot_config:
sampler: first_n # or "default" for random
Best Practices
- Use
FirstNSamplerfor benchmarks with canonical examples - Set
num_fewshotbased on model context limits - Ensure
fewshot_splithas sufficient examples - Use separate train split when available for few-shot
- Test formatting with edge cases (long text, special chars)
- Seed random generator for reproducible results
- Document expected format in task configuration
- Consider task difficulty when setting shot count
Common Patterns
MMLU-Style Tasks
- Use
FirstNSamplerfor deterministic ordering - 5-shot is standard for MMLU
- Examples from development/validation set
- Multiple choice format with letter answers
Generative Tasks
- Random sampling often sufficient
- Fewer shots may be needed (2-3)
- Target can be free-form text
- Consider example diversity
Zero-Shot Evaluation
- Set
num_fewshot=0 - Only system prompt provided
- Tests model's base knowledge
- Useful for instruction-tuned models
Related Pages
Implementations
- EvolvingLMMs_Lab_Lmms_eval_Context_Samplers — implementation of sampling strategies
- Implementation:EvolvingLMMs_Lab_Lmms_eval_Context_Samplers
See Also
- EvolvingLMMs_Lab_Lmms_eval_Request_Construction — few-shot context is prepended to model requests
- EvolvingLMMs_Lab_Lmms_eval_Task_Directory_Structure — tasks define few-shot configuration
- EvolvingLMMs_Lab_Lmms_eval_YAML_Task_Configuration — few-shot settings in task YAML
- EvolvingLMMs_Lab_Lmms_eval_Dataset_Preparation — few-shot pool comes from dataset splits