Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Explodinggradients Ragas GeneticOptimizer Class

From Leeroopedia


GeneticOptimizer Class

GeneticOptimizer is the concrete implementation of Genetic Prompt Optimization in the Ragas evaluation toolkit. It evolves evaluation metric prompts through reverse-engineering, feedback mutation, crossover mutation, and fitness evaluation stages.

Source Location

  • File: src/ragas/optimizers/genetic.py
  • Class definition: Lines 129-737
  • optimize method: Lines 139-251

Import

from ragas.optimizers import GeneticOptimizer

Or directly:

from ragas.optimizers.genetic import GeneticOptimizer

Class Hierarchy

Optimizer (ABC, dataclass)
  └── GeneticOptimizer

The base class Optimizer (defined in src/ragas/optimizers/base.py) declares:

@dataclass
class Optimizer(ABC):
    metric: t.Optional[MetricWithLLM] = None
    llm: t.Optional[BaseRagasLLM] = None

    @abstractmethod
    def optimize(self, dataset, loss, config, ...) -> Dict[str, str]: ...

Constructor

GeneticOptimizer inherits the constructor from Optimizer:

optimizer = GeneticOptimizer(
    metric=my_metric,   # Optional[MetricWithLLM], default None
    llm=my_llm,         # Optional[BaseRagasLLM], default None
)

Both metric and llm must be set before calling optimize(); otherwise a ValueError is raised.

optimize() Method

Signature

def optimize(
    self,
    dataset: SingleMetricAnnotation,
    loss: Loss,
    config: Dict[Any, Any],
    run_config: Optional[RunConfig] = None,
    batch_size: Optional[int] = None,
    callbacks: Optional[Callbacks] = None,
    with_debugging_logs: bool = False,
    raise_exceptions: bool = True,
) -> Dict[str, str]

Parameters

Parameter Type Default Description
dataset SingleMetricAnnotation required Human-annotated dataset with ground-truth labels.
loss Loss required Loss function to evaluate fitness (e.g., BinaryMetricLoss, MSELoss).
config Dict[Any, Any] required Configuration dictionary with keys such as "population_size", "num_demonstrations", "sample_size".
run_config Optional[RunConfig] None Runtime configuration for timeout and retries.
batch_size Optional[int] None Batch size for parallel execution.
callbacks Optional[Callbacks] None LangChain callbacks for tracing/logging.
with_debugging_logs bool False Enable debugging log output.
raise_exceptions bool True Whether to raise exceptions during evaluation.

Return Value

Returns Dict[str, str] mapping prompt names to optimized instruction strings. These can be applied to a metric via metric.set_prompts() or saved with metric.save_prompts().

Config Dictionary Keys

Key Type Default Description
"population_size" int 3 Number of candidate prompts in the population.
"num_demonstrations" int 3 Number of annotated examples per batch for reverse engineering.
"sample_size" int 12 Number of samples used for feedback mutation.

Internal Stages

The optimize method executes four stages in sequence, tracked by a progress bar:

Stage 1: Initialize Population (Lines 253-305)

def initialize_population(
    self,
    *,
    dataset: SingleMetricAnnotation,
    population_size: int,
    num_demonstrations: int = 3,
    ...
) -> List[Dict[str, str]]
  • Filters the dataset to accepted annotations.
  • Creates stratified batches of num_demonstrations examples each.
  • For each batch, calls _reverse_engineer_instruction() which uses the ReverseEngineerPrompt to infer what instruction an annotator might have been following.
  • Appends the metric's default prompt as a seed candidate.

Stage 2: Feedback Mutation (Lines 362-417)

def feedback_mutation(
    self,
    candidates: List[Dict[str, str]],
    dataset: SingleMetricAnnotation,
    sample_size: int,
    ...
) -> List[Dict[str, str]]
  • For each candidate, evaluates it on a stratified sample.
  • Identifies samples where the metric prediction disagrees with the human label.
  • Uses FeedbackMutationPrompt to generate concrete improvement suggestions.
  • Uses FeedbackMutationPromptGeneration to incorporate feedback into a revised instruction.

Stage 3: Crossover Mutation (Lines 662-736)

def cross_over_mutation(
    self,
    *,
    candidates: List[Dict[str, str]],
    dataset: SingleMetricAnnotation,
    ...
) -> List[Dict[str, str]]
  • Evaluates all candidates and builds prediction vectors.
  • Computes a Hamming distance matrix between prediction vectors.
  • Pairs each candidate with its most dissimilar partner.
  • Uses CrossOverPrompt to combine parents into offspring.

Stage 4: Fitness Evaluation (Lines 597-642)

def evaluate_fitness(
    self,
    *,
    candidates: List[Dict[str, str]],
    dataset: SingleMetricAnnotation,
    loss_fn: Loss,
    ...
) -> List[float]
  • Evaluates each candidate on the full accepted dataset.
  • Computes the loss between predicted and ground-truth labels.
  • Returns the candidate with the highest fitness (via np.argmax).

Helper Prompts

The optimizer uses four internal LLM prompt classes (defined in the same file):

Prompt Class Lines Purpose
ReverseEngineerPrompt 53-57 Infers an instruction from annotated input-output pairs.
CrossOverPrompt 65-85 Combines two parent prompts into a semantic offspring.
FeedbackMutationPrompt 103-112 Generates improvement feedback from failure cases.
FeedbackMutationPromptGeneration 120-126 Applies feedback to generate a revised instruction.

Usage Example

from ragas.optimizers import GeneticOptimizer
from ragas.losses import BinaryMetricLoss
from ragas.dataset_schema import SingleMetricAnnotation

# Load annotated data
annotations = SingleMetricAnnotation.from_json("annotations.json")

# Create optimizer
optimizer = GeneticOptimizer(metric=my_metric, llm=my_llm)

# Run optimization
config = {
    "population_size": 3,
    "num_demonstrations": 3,
    "sample_size": 12,
}

best_prompts = optimizer.optimize(
    dataset=annotations,
    loss=BinaryMetricLoss(metric="accuracy"),
    config=config,
)

# Apply optimized prompts to the metric
prompts = my_metric.get_prompts()
for name, instruction in best_prompts.items():
    prompts[name].instruction = instruction
my_metric.set_prompts(**prompts)

Minimum Data Requirement

The optimizer enforces a minimum of 10 annotations (constant MIN_ANNOTATIONS = 10 at line 28). If the dataset contains fewer samples, a ValueError is raised with a message indicating how many more samples are needed.

Implements

See Also

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment