Implementation:EvolvingLMMs Lab Lmms eval Context Samplers

- File**: `/tmp/kapso_repo_sslb_59s/lmms_eval/api/samplers.py`

- Principle**: Few_Shot_Sampling

1. Overview

Context Samplers implement various strategies for selecting and formatting few-shot examples to provide as context during model evaluation. The module provides a base ContextSampler class and specialized samplers for different selection strategies.

1. Key Components

1. 1. 1. ContextSampler (Base Class)

```python class ContextSampler:

   def __init__(self, docs, task, fewshot_indices=None, rnd=None) -> None:

```

- Purpose**: Base sampler that implements random selection and context formatting

- Initialization Parameters**:

- `docs`: HuggingFace dataset split containing potential few-shot examples - `task`: Task instance providing doc conversion methods - `fewshot_indices` (optional): Specific indices to sample from (subsets the docs) - `rnd`: Random number generator (required for reproducibility)

- Attributes Set During Init**:

```python self.rnd = rnd assert self.rnd, "must pass rnd to FewShotSampler!"

self.task = task self.config = task._config

self.target_delimiter = self.config.target_delimiter self.fewshot_delimiter = self.config.fewshot_delimiter

self.doc_to_text = self.task.doc_to_text self.doc_to_target = self.task.doc_to_target self.doc_to_choice = self.task.doc_to_choice

self.docs = docs if fewshot_indices:

   self.docs = self.docs.select(fewshot_indices)

```

- Key Methods**:

1. 1. 1. get_context Method

```python def get_context(self, doc, num_fewshot): ```

- Purpose**: Generate formatted few-shot context string for a test document

- Implementation Steps**:

1. **Determine Sample Size**: ```python n_samples = num_fewshot + 1 if self.config.fewshot_split == self.config.test_split else num_fewshot ``` Draw extra sample if using same split (to filter out test doc)

2. **Sample Documents**: ```python fewshotex = self.sample(n_samples) ``` Call sampling method (overridable by subclasses)

3. **Filter Out Test Document**: ```python selected_docs = [x for x in fewshotex if x != doc][:num_fewshot] ``` Remove test doc if present, take first num_fewshot

4. **Format Examples**: ```python labeled_examples = (

   self.fewshot_delimiter.join(
       [
           (self.doc_to_text(doc) if (self.config.doc_to_choice is None or type(self.doc_to_text(doc)) is str) else self.doc_to_choice(doc)[self.doc_to_text(doc)])
           + self.target_delimiter
           + (
               str(self.doc_to_target(doc)[0])
               if type(self.doc_to_target(doc)) is list
               else self.doc_to_target(doc) if (self.config.doc_to_choice is None or type(self.doc_to_target(doc)) is str) else str(self.doc_to_choice(doc)[self.doc_to_target(doc)])
           )
           for doc in selected_docs
       ]
   )
   + self.fewshot_delimiter

) ```

- Formatting Logic**:

- For each selected document:

 - Get text: Use doc_to_text directly if string, else lookup in doc_to_choice if index
 - Add target_delimiter
 - Get target: Extract from list if list, use directly if string, else lookup in doc_to_choice

- Join all examples with fewshot_delimiter - Append final fewshot_delimiter for test instance

- Returns**: Formatted string of labeled examples

1. 1. 1. sample Method

```python def sample(self, n):

   """
   Draw `n` samples from our fewshot docs. This method should be overridden by subclasses.
   """
   return self.rnd.sample(self.docs, n)

```

- Purpose**: Base implementation uses random sampling (overridden by subclasses)

1. 1. 2. FirstNSampler

```python class FirstNSampler(ContextSampler):

   def sample(self, n) -> None:
       """
       Draw the first `n` samples in order from the specified split.
       Used for tasks with "canonical" ordered fewshot examples, such as MMLU and CMMLU.
       """
       assert n <= len(self.docs), f"Error: number of fewshot samples requested exceeds the {len(self.docs)} that are available."
       return self.docs[:n]

```

- Purpose**: Deterministic sampling for benchmarks with canonical example ordering

- Use Cases**:

- MMLU: Uses first 5 examples from dev set - CMMLU: Uses first N examples in prescribed order - Any benchmark with curated example sets

- Validation**: Asserts sufficient examples are available

1. 1. 3. BalancedSampler

```python class BalancedSampler(ContextSampler):

   def sample(self, n) -> None:
       """
       TODO: this should return approximately class-balanced samples from our fewshot examples.
       TODO: what order should they be in? maybe random?
       """
       pass

```

- Status**: Not yet implemented

- Intended Purpose**: Sample examples to balance class distribution

- Design Questions**:

- How to determine class labels? - What if perfect balance isn't possible? - Should order be randomized or stratified?

1. 1. 4. ManualSampler

```python class ManualSampler(ContextSampler):

   def sample(self, n) -> None:
       """ """
       pass

```

- Status**: Not yet implemented

- Intended Purpose**: Allow user to specify exact examples to use

1. 1. 5. Sampler Registry

```python SAMPLER_REGISTRY = {

   "default": ContextSampler,
   "first_n": FirstNSampler,

}

def get_sampler(name):

   try:
       return SAMPLER_REGISTRY[name]
   except KeyError:
       raise ValueError(f"Attempted to use contextsampler '{name}', but no sampling strategy for this name found! Supported model names: {', '.join(SAMPLER_REGISTRY.keys())}")

```

- Purpose**: Central registry for looking up sampler classes by name

- Registered Samplers**:

- "default": ContextSampler (random sampling) - "first_n": FirstNSampler (deterministic ordering)

1. Usage Examples

1. 1. Basic Random Sampling

```python from lmms_eval.api.samplers import ContextSampler import random

Initialize sampler

sampler = ContextSampler(

   docs=train_dataset,
   task=task_instance,
   rnd=random.Random(42)

)

Get 5-shot context

context = sampler.get_context(doc=test_doc, num_fewshot=5) ```

1. 1. First-N Sampling for MMLU

```python from lmms_eval.api.samplers import FirstNSampler

sampler = FirstNSampler(

   docs=dev_dataset,
   task=mmlu_task,
   rnd=random.Random(1234)  # Still required even though unused

)

Always gets same first 5 examples

context = sampler.get_context(doc=test_doc, num_fewshot=5) ```

1. 1. Using Registry

```python from lmms_eval.api.samplers import get_sampler

Get sampler class from registry

SamplerClass = get_sampler("first_n")

Instantiate

sampler = SamplerClass(docs=docs, task=task, rnd=rng) ```

1. 1. With Subset of Examples

```python

Only sample from specific indices

sampler = ContextSampler(

   docs=full_dataset,
   task=task,
   fewshot_indices=[0, 5, 10, 15, 20],  # Only use these
   rnd=random.Random(42)

) ```

1. Format Examples

1. 1. Multiple Choice Format

``` Question: What is 2+2? A. 3 B. 4 C. 5 D. 6 Answer: B

Question: What is the capital of France? A. London B. Paris C. Berlin D. Madrid Answer: B

[Test question follows] ```

1. 1. Generative Format

``` Input: Translate to French: Hello Output: Bonjour

Input: Translate to French: Goodbye Output: Au revoir

Input: Translate to French: Thank you [Model generates answer] ```

1. Design Decisions

1. **Random Generator Required**: Forces reproducibility by requiring explicit RNG

2. **Task Method Delegation**: Uses task's doc_to_text/target/choice for format consistency

3. **Same-Split Handling**: Automatically handles case where fewshot and test splits are identical

4. **Complex Formatting Logic**: Handles both string and choice-based formats in one method

5. **Delimiter Configuration**: Uses task config for delimiters to maintain consistency

6. **Document Equality**: Relies on dataset's equality comparison to filter test doc

1. Common Issues and Solutions

1. 1. Issue: Test Doc in Few-Shot Examples

- Solution**: Automatically filtered out in get_context, extra sample drawn if needed

1. 1. Issue: Insufficient Examples

- Solution**: FirstNSampler asserts and raises clear error; base sampler will raise from random.sample

1. 1. Issue: Format Mismatch

- Solution**: Always use task's doc_to_text/target methods, check task config for delimiters

1. 1. Issue: Non-Reproducible Results

- Solution**: Always pass seeded RNG, use FirstNSampler for deterministic needs

1. Related Components

- Few_Shot_Sampling: Principle this implements - Request_Construction: Uses context strings in requests - YAML_Task_Configuration: Task configs specify sampler type - Task_Directory_Structure: Tasks provide doc conversion methods

1. Extension Points

1. Implement BalancedSampler for class-balanced selection 2. Implement ManualSampler for user-specified examples 3. Add SimilaritySampler for semantic similarity-based selection 4. Add DiversitySampler for maximum diversity 5. Support custom ordering functions in FirstNSampler 6. Add validation for delimiter consistency

1. Best Practices

1. Always use seeded RNG for reproducibility 2. Choose sampler based on benchmark requirements 3. Verify sufficient examples available in fewshot_split 4. Test formatting with edge cases 5. Use FirstNSampler for benchmarks with canonical examples 6. Document sampler choice in task configuration 7. Consider context length when setting num_fewshot

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment