Implementation:Explodinggradients Ragas GeneticOptimizer Class
GeneticOptimizer Class
GeneticOptimizer is the concrete implementation of Genetic Prompt Optimization in the Ragas evaluation toolkit. It evolves evaluation metric prompts through reverse-engineering, feedback mutation, crossover mutation, and fitness evaluation stages.
Source Location
- File:
src/ragas/optimizers/genetic.py - Class definition: Lines 129-737
optimizemethod: Lines 139-251
Import
from ragas.optimizers import GeneticOptimizer
Or directly:
from ragas.optimizers.genetic import GeneticOptimizer
Class Hierarchy
Optimizer (ABC, dataclass)
└── GeneticOptimizer
The base class Optimizer (defined in src/ragas/optimizers/base.py) declares:
@dataclass
class Optimizer(ABC):
metric: t.Optional[MetricWithLLM] = None
llm: t.Optional[BaseRagasLLM] = None
@abstractmethod
def optimize(self, dataset, loss, config, ...) -> Dict[str, str]: ...
Constructor
GeneticOptimizer inherits the constructor from Optimizer:
optimizer = GeneticOptimizer(
metric=my_metric, # Optional[MetricWithLLM], default None
llm=my_llm, # Optional[BaseRagasLLM], default None
)
Both metric and llm must be set before calling optimize(); otherwise a ValueError is raised.
optimize() Method
Signature
def optimize(
self,
dataset: SingleMetricAnnotation,
loss: Loss,
config: Dict[Any, Any],
run_config: Optional[RunConfig] = None,
batch_size: Optional[int] = None,
callbacks: Optional[Callbacks] = None,
with_debugging_logs: bool = False,
raise_exceptions: bool = True,
) -> Dict[str, str]
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
dataset |
SingleMetricAnnotation |
required | Human-annotated dataset with ground-truth labels. |
loss |
Loss |
required | Loss function to evaluate fitness (e.g., BinaryMetricLoss, MSELoss).
|
config |
Dict[Any, Any] |
required | Configuration dictionary with keys such as "population_size", "num_demonstrations", "sample_size".
|
run_config |
Optional[RunConfig] |
None |
Runtime configuration for timeout and retries. |
batch_size |
Optional[int] |
None |
Batch size for parallel execution. |
callbacks |
Optional[Callbacks] |
None |
LangChain callbacks for tracing/logging. |
with_debugging_logs |
bool |
False |
Enable debugging log output. |
raise_exceptions |
bool |
True |
Whether to raise exceptions during evaluation. |
Return Value
Returns Dict[str, str] mapping prompt names to optimized instruction strings. These can be applied to a metric via metric.set_prompts() or saved with metric.save_prompts().
Config Dictionary Keys
| Key | Type | Default | Description |
|---|---|---|---|
"population_size" |
int |
3 | Number of candidate prompts in the population. |
"num_demonstrations" |
int |
3 | Number of annotated examples per batch for reverse engineering. |
"sample_size" |
int |
12 | Number of samples used for feedback mutation. |
Internal Stages
The optimize method executes four stages in sequence, tracked by a progress bar:
Stage 1: Initialize Population (Lines 253-305)
def initialize_population(
self,
*,
dataset: SingleMetricAnnotation,
population_size: int,
num_demonstrations: int = 3,
...
) -> List[Dict[str, str]]
- Filters the dataset to accepted annotations.
- Creates stratified batches of
num_demonstrationsexamples each. - For each batch, calls
_reverse_engineer_instruction()which uses theReverseEngineerPromptto infer what instruction an annotator might have been following. - Appends the metric's default prompt as a seed candidate.
Stage 2: Feedback Mutation (Lines 362-417)
def feedback_mutation(
self,
candidates: List[Dict[str, str]],
dataset: SingleMetricAnnotation,
sample_size: int,
...
) -> List[Dict[str, str]]
- For each candidate, evaluates it on a stratified sample.
- Identifies samples where the metric prediction disagrees with the human label.
- Uses
FeedbackMutationPromptto generate concrete improvement suggestions. - Uses
FeedbackMutationPromptGenerationto incorporate feedback into a revised instruction.
Stage 3: Crossover Mutation (Lines 662-736)
def cross_over_mutation(
self,
*,
candidates: List[Dict[str, str]],
dataset: SingleMetricAnnotation,
...
) -> List[Dict[str, str]]
- Evaluates all candidates and builds prediction vectors.
- Computes a Hamming distance matrix between prediction vectors.
- Pairs each candidate with its most dissimilar partner.
- Uses
CrossOverPromptto combine parents into offspring.
Stage 4: Fitness Evaluation (Lines 597-642)
def evaluate_fitness(
self,
*,
candidates: List[Dict[str, str]],
dataset: SingleMetricAnnotation,
loss_fn: Loss,
...
) -> List[float]
- Evaluates each candidate on the full accepted dataset.
- Computes the loss between predicted and ground-truth labels.
- Returns the candidate with the highest fitness (via
np.argmax).
Helper Prompts
The optimizer uses four internal LLM prompt classes (defined in the same file):
| Prompt Class | Lines | Purpose |
|---|---|---|
ReverseEngineerPrompt |
53-57 | Infers an instruction from annotated input-output pairs. |
CrossOverPrompt |
65-85 | Combines two parent prompts into a semantic offspring. |
FeedbackMutationPrompt |
103-112 | Generates improvement feedback from failure cases. |
FeedbackMutationPromptGeneration |
120-126 | Applies feedback to generate a revised instruction. |
Usage Example
from ragas.optimizers import GeneticOptimizer
from ragas.losses import BinaryMetricLoss
from ragas.dataset_schema import SingleMetricAnnotation
# Load annotated data
annotations = SingleMetricAnnotation.from_json("annotations.json")
# Create optimizer
optimizer = GeneticOptimizer(metric=my_metric, llm=my_llm)
# Run optimization
config = {
"population_size": 3,
"num_demonstrations": 3,
"sample_size": 12,
}
best_prompts = optimizer.optimize(
dataset=annotations,
loss=BinaryMetricLoss(metric="accuracy"),
config=config,
)
# Apply optimized prompts to the metric
prompts = my_metric.get_prompts()
for name, instruction in best_prompts.items():
prompts[name].instruction = instruction
my_metric.set_prompts(**prompts)
Minimum Data Requirement
The optimizer enforces a minimum of 10 annotations (constant MIN_ANNOTATIONS = 10 at line 28). If the dataset contains fewer samples, a ValueError is raised with a message indicating how many more samples are needed.
Implements
See Also
- Principle:Explodinggradients_Ragas_Genetic_Prompt_Optimization
- DSPyOptimizer Class -- Alternative optimizer using DSPy's MIPROv2.
- Loss Classes -- Fitness functions used during optimization.
- MetricAnnotation Class -- Annotation data format.
- PromptMixin Save/Load -- Persisting optimized prompts.
- Environment:Explodinggradients_Ragas_LLM_Provider_Environment