Implementation:EvolvingLMMs Lab Lmms eval Context Samplers
- File**: `/tmp/kapso_repo_sslb_59s/lmms_eval/api/samplers.py`
- Principle**: Few_Shot_Sampling
- Overview
Context Samplers implement various strategies for selecting and formatting few-shot examples to provide as context during model evaluation. The module provides a base ContextSampler class and specialized samplers for different selection strategies.
- Key Components
- 1. ContextSampler (Base Class)
```python class ContextSampler:
def __init__(self, docs, task, fewshot_indices=None, rnd=None) -> None:
```
- Purpose**: Base sampler that implements random selection and context formatting
- Initialization Parameters**:
- `docs`: HuggingFace dataset split containing potential few-shot examples - `task`: Task instance providing doc conversion methods - `fewshot_indices` (optional): Specific indices to sample from (subsets the docs) - `rnd`: Random number generator (required for reproducibility)
- Attributes Set During Init**:
```python self.rnd = rnd assert self.rnd, "must pass rnd to FewShotSampler!"
self.task = task self.config = task._config
self.target_delimiter = self.config.target_delimiter self.fewshot_delimiter = self.config.fewshot_delimiter
self.doc_to_text = self.task.doc_to_text self.doc_to_target = self.task.doc_to_target self.doc_to_choice = self.task.doc_to_choice
self.docs = docs if fewshot_indices:
self.docs = self.docs.select(fewshot_indices)
```
- Key Methods**:
- get_context Method
```python def get_context(self, doc, num_fewshot): ```
- Purpose**: Generate formatted few-shot context string for a test document
- Implementation Steps**:
1. **Determine Sample Size**: ```python n_samples = num_fewshot + 1 if self.config.fewshot_split == self.config.test_split else num_fewshot ``` Draw extra sample if using same split (to filter out test doc)
2. **Sample Documents**: ```python fewshotex = self.sample(n_samples) ``` Call sampling method (overridable by subclasses)
3. **Filter Out Test Document**: ```python selected_docs = [x for x in fewshotex if x != doc][:num_fewshot] ``` Remove test doc if present, take first num_fewshot
4. **Format Examples**: ```python labeled_examples = (
self.fewshot_delimiter.join(
[
(self.doc_to_text(doc) if (self.config.doc_to_choice is None or type(self.doc_to_text(doc)) is str) else self.doc_to_choice(doc)[self.doc_to_text(doc)])
+ self.target_delimiter
+ (
str(self.doc_to_target(doc)[0])
if type(self.doc_to_target(doc)) is list
else self.doc_to_target(doc) if (self.config.doc_to_choice is None or type(self.doc_to_target(doc)) is str) else str(self.doc_to_choice(doc)[self.doc_to_target(doc)])
)
for doc in selected_docs
]
)
+ self.fewshot_delimiter
) ```
- Formatting Logic**:
- For each selected document:
- Get text: Use doc_to_text directly if string, else lookup in doc_to_choice if index - Add target_delimiter - Get target: Extract from list if list, use directly if string, else lookup in doc_to_choice
- Join all examples with fewshot_delimiter - Append final fewshot_delimiter for test instance
- Returns**: Formatted string of labeled examples
- sample Method
```python def sample(self, n):
""" Draw `n` samples from our fewshot docs. This method should be overridden by subclasses. """ return self.rnd.sample(self.docs, n)
```
- Purpose**: Base implementation uses random sampling (overridden by subclasses)
- 2. FirstNSampler
```python class FirstNSampler(ContextSampler):
def sample(self, n) -> None:
"""
Draw the first `n` samples in order from the specified split.
Used for tasks with "canonical" ordered fewshot examples, such as MMLU and CMMLU.
"""
assert n <= len(self.docs), f"Error: number of fewshot samples requested exceeds the {len(self.docs)} that are available."
return self.docs[:n]
```
- Purpose**: Deterministic sampling for benchmarks with canonical example ordering
- Use Cases**:
- MMLU: Uses first 5 examples from dev set - CMMLU: Uses first N examples in prescribed order - Any benchmark with curated example sets
- Validation**: Asserts sufficient examples are available
- 3. BalancedSampler
```python class BalancedSampler(ContextSampler):
def sample(self, n) -> None:
"""
TODO: this should return approximately class-balanced samples from our fewshot examples.
TODO: what order should they be in? maybe random?
"""
pass
```
- Status**: Not yet implemented
- Intended Purpose**: Sample examples to balance class distribution
- Design Questions**:
- How to determine class labels? - What if perfect balance isn't possible? - Should order be randomized or stratified?
- 4. ManualSampler
```python class ManualSampler(ContextSampler):
def sample(self, n) -> None:
""" """
pass
```
- Status**: Not yet implemented
- Intended Purpose**: Allow user to specify exact examples to use
- 5. Sampler Registry
```python SAMPLER_REGISTRY = {
"default": ContextSampler, "first_n": FirstNSampler,
}
def get_sampler(name):
try:
return SAMPLER_REGISTRY[name]
except KeyError:
raise ValueError(f"Attempted to use contextsampler '{name}', but no sampling strategy for this name found! Supported model names: {', '.join(SAMPLER_REGISTRY.keys())}")
```
- Purpose**: Central registry for looking up sampler classes by name
- Registered Samplers**:
- "default": ContextSampler (random sampling) - "first_n": FirstNSampler (deterministic ordering)
- Usage Examples
- Basic Random Sampling
```python from lmms_eval.api.samplers import ContextSampler import random
- Initialize sampler
sampler = ContextSampler(
docs=train_dataset, task=task_instance, rnd=random.Random(42)
)
- Get 5-shot context
context = sampler.get_context(doc=test_doc, num_fewshot=5) ```
- First-N Sampling for MMLU
```python from lmms_eval.api.samplers import FirstNSampler
sampler = FirstNSampler(
docs=dev_dataset, task=mmlu_task, rnd=random.Random(1234) # Still required even though unused
)
- Always gets same first 5 examples
context = sampler.get_context(doc=test_doc, num_fewshot=5) ```
- Using Registry
```python from lmms_eval.api.samplers import get_sampler
- Get sampler class from registry
SamplerClass = get_sampler("first_n")
- Instantiate
sampler = SamplerClass(docs=docs, task=task, rnd=rng) ```
- With Subset of Examples
```python
- Only sample from specific indices
sampler = ContextSampler(
docs=full_dataset, task=task, fewshot_indices=[0, 5, 10, 15, 20], # Only use these rnd=random.Random(42)
) ```
- Format Examples
- Multiple Choice Format
``` Question: What is 2+2? A. 3 B. 4 C. 5 D. 6 Answer: B
Question: What is the capital of France? A. London B. Paris C. Berlin D. Madrid Answer: B
[Test question follows] ```
- Generative Format
``` Input: Translate to French: Hello Output: Bonjour
Input: Translate to French: Goodbye Output: Au revoir
Input: Translate to French: Thank you [Model generates answer] ```
- Design Decisions
1. **Random Generator Required**: Forces reproducibility by requiring explicit RNG
2. **Task Method Delegation**: Uses task's doc_to_text/target/choice for format consistency
3. **Same-Split Handling**: Automatically handles case where fewshot and test splits are identical
4. **Complex Formatting Logic**: Handles both string and choice-based formats in one method
5. **Delimiter Configuration**: Uses task config for delimiters to maintain consistency
6. **Document Equality**: Relies on dataset's equality comparison to filter test doc
- Common Issues and Solutions
- Issue: Test Doc in Few-Shot Examples
- Solution**: Automatically filtered out in get_context, extra sample drawn if needed
- Issue: Insufficient Examples
- Solution**: FirstNSampler asserts and raises clear error; base sampler will raise from random.sample
- Issue: Format Mismatch
- Solution**: Always use task's doc_to_text/target methods, check task config for delimiters
- Issue: Non-Reproducible Results
- Solution**: Always pass seeded RNG, use FirstNSampler for deterministic needs
- Related Components
- Few_Shot_Sampling: Principle this implements - Request_Construction: Uses context strings in requests - YAML_Task_Configuration: Task configs specify sampler type - Task_Directory_Structure: Tasks provide doc conversion methods
- Extension Points
1. Implement BalancedSampler for class-balanced selection 2. Implement ManualSampler for user-specified examples 3. Add SimilaritySampler for semantic similarity-based selection 4. Add DiversitySampler for maximum diversity 5. Support custom ordering functions in FirstNSampler 6. Add validation for delimiter consistency
- Best Practices
1. Always use seeded RNG for reproducibility 2. Choose sampler based on benchmark requirements 3. Verify sufficient examples available in fewshot_split 4. Test formatting with edge cases 5. Use FirstNSampler for benchmarks with canonical examples 6. Document sampler choice in task configuration 7. Consider context length when setting num_fewshot