Implementation:Explodinggradients Ragas GeneticOptimizer Class

GeneticOptimizer Class

GeneticOptimizer is the concrete implementation of Genetic Prompt Optimization in the Ragas evaluation toolkit. It evolves evaluation metric prompts through reverse-engineering, feedback mutation, crossover mutation, and fitness evaluation stages.

Source Location

File: src/ragas/optimizers/genetic.py
Class definition: Lines 129-737
optimize method: Lines 139-251

Import

from ragas.optimizers import GeneticOptimizer

Or directly:

from ragas.optimizers.genetic import GeneticOptimizer

Class Hierarchy

Optimizer (ABC, dataclass)
  └── GeneticOptimizer

The base class Optimizer (defined in src/ragas/optimizers/base.py) declares:

@dataclass
class Optimizer(ABC):
    metric: t.Optional[MetricWithLLM] = None
    llm: t.Optional[BaseRagasLLM] = None

    @abstractmethod
    def optimize(self, dataset, loss, config, ...) -> Dict[str, str]: ...

Constructor

GeneticOptimizer inherits the constructor from Optimizer:

optimizer = GeneticOptimizer(
    metric=my_metric,   # Optional[MetricWithLLM], default None
    llm=my_llm,         # Optional[BaseRagasLLM], default None
)

Both metric and llm must be set before calling optimize(); otherwise a ValueError is raised.

optimize() Method

Signature

def optimize(
    self,
    dataset: SingleMetricAnnotation,
    loss: Loss,
    config: Dict[Any, Any],
    run_config: Optional[RunConfig] = None,
    batch_size: Optional[int] = None,
    callbacks: Optional[Callbacks] = None,
    with_debugging_logs: bool = False,
    raise_exceptions: bool = True,
) -> Dict[str, str]

Parameters

Parameter	Type	Default	Description
`dataset`	`SingleMetricAnnotation`	required	Human-annotated dataset with ground-truth labels.
`loss`	`Loss`	required	Loss function to evaluate fitness (e.g., `BinaryMetricLoss`, `MSELoss`).
`config`	`Dict[Any, Any]`	required	Configuration dictionary with keys such as `"population_size"`, `"num_demonstrations"`, `"sample_size"`.
`run_config`	`Optional[RunConfig]`	`None`	Runtime configuration for timeout and retries.
`batch_size`	`Optional[int]`	`None`	Batch size for parallel execution.
`callbacks`	`Optional[Callbacks]`	`None`	LangChain callbacks for tracing/logging.
`with_debugging_logs`	`bool`	`False`	Enable debugging log output.
`raise_exceptions`	`bool`	`True`	Whether to raise exceptions during evaluation.

Return Value

Returns Dict[str, str] mapping prompt names to optimized instruction strings. These can be applied to a metric via metric.set_prompts() or saved with metric.save_prompts().

Config Dictionary Keys

Key	Type	Default	Description
`"population_size"`	`int`	3	Number of candidate prompts in the population.
`"num_demonstrations"`	`int`	3	Number of annotated examples per batch for reverse engineering.
`"sample_size"`	`int`	12	Number of samples used for feedback mutation.

Internal Stages

The optimize method executes four stages in sequence, tracked by a progress bar:

Stage 1: Initialize Population (Lines 253-305)

def initialize_population(
    self,
    *,
    dataset: SingleMetricAnnotation,
    population_size: int,
    num_demonstrations: int = 3,
    ...
) -> List[Dict[str, str]]

Filters the dataset to accepted annotations.
Creates stratified batches of num_demonstrations examples each.
For each batch, calls _reverse_engineer_instruction() which uses the ReverseEngineerPrompt to infer what instruction an annotator might have been following.
Appends the metric's default prompt as a seed candidate.

Stage 2: Feedback Mutation (Lines 362-417)

def feedback_mutation(
    self,
    candidates: List[Dict[str, str]],
    dataset: SingleMetricAnnotation,
    sample_size: int,
    ...
) -> List[Dict[str, str]]

For each candidate, evaluates it on a stratified sample.
Identifies samples where the metric prediction disagrees with the human label.
Uses FeedbackMutationPrompt to generate concrete improvement suggestions.
Uses FeedbackMutationPromptGeneration to incorporate feedback into a revised instruction.

Stage 3: Crossover Mutation (Lines 662-736)

def cross_over_mutation(
    self,
    *,
    candidates: List[Dict[str, str]],
    dataset: SingleMetricAnnotation,
    ...
) -> List[Dict[str, str]]

Evaluates all candidates and builds prediction vectors.
Computes a Hamming distance matrix between prediction vectors.
Pairs each candidate with its most dissimilar partner.
Uses CrossOverPrompt to combine parents into offspring.

Stage 4: Fitness Evaluation (Lines 597-642)

def evaluate_fitness(
    self,
    *,
    candidates: List[Dict[str, str]],
    dataset: SingleMetricAnnotation,
    loss_fn: Loss,
    ...
) -> List[float]

Evaluates each candidate on the full accepted dataset.
Computes the loss between predicted and ground-truth labels.
Returns the candidate with the highest fitness (via np.argmax).

Helper Prompts

The optimizer uses four internal LLM prompt classes (defined in the same file):

Prompt Class	Lines	Purpose
`ReverseEngineerPrompt`	53-57	Infers an instruction from annotated input-output pairs.
`CrossOverPrompt`	65-85	Combines two parent prompts into a semantic offspring.
`FeedbackMutationPrompt`	103-112	Generates improvement feedback from failure cases.
`FeedbackMutationPromptGeneration`	120-126	Applies feedback to generate a revised instruction.

Usage Example

from ragas.optimizers import GeneticOptimizer
from ragas.losses import BinaryMetricLoss
from ragas.dataset_schema import SingleMetricAnnotation

# Load annotated data
annotations = SingleMetricAnnotation.from_json("annotations.json")

# Create optimizer
optimizer = GeneticOptimizer(metric=my_metric, llm=my_llm)

# Run optimization
config = {
    "population_size": 3,
    "num_demonstrations": 3,
    "sample_size": 12,
}

best_prompts = optimizer.optimize(
    dataset=annotations,
    loss=BinaryMetricLoss(metric="accuracy"),
    config=config,
)

# Apply optimized prompts to the metric
prompts = my_metric.get_prompts()
for name, instruction in best_prompts.items():
    prompts[name].instruction = instruction
my_metric.set_prompts(**prompts)

Minimum Data Requirement

The optimizer enforces a minimum of 10 annotations (constant MIN_ANNOTATIONS = 10 at line 28). If the dataset contains fewer samples, a ValueError is raised with a message indicating how many more samples are needed.

Implements

Principle: Genetic Prompt Optimization

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment