Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Explodinggradients Ragas Metric Baseline Correlation

From Leeroopedia


Metric Baseline Correlation

Metric Baseline Correlation is a principle in the Ragas evaluation toolkit for measuring the statistical agreement between metric predictions and human judgments. It establishes a quantitative baseline against which prompt optimization improvements can be compared.

Motivation

Before optimizing a metric's prompts, it is important to understand how well the metric already performs against human judgment. Correlation metrics provide this understanding:

  • Baseline establishment -- The pre-optimization correlation sets a benchmark for improvement.
  • Optimization validation -- Post-optimization correlation should exceed the baseline, confirming that prompt changes improved alignment.
  • Metric comparison -- Different metrics or prompt variants can be compared by their correlation with human labels.

Without a baseline measurement, it is impossible to know whether optimization is producing meaningful improvement or simply overfitting to the training data.

Theoretical Foundation

Correlation for Discrete Metrics

For metrics that produce categorical outputs (e.g., "pass"/"fail", or custom discrete categories), Cohen's Kappa is used to measure agreement:

κ=pope1pe

Where:

  • po is the observed agreement (proportion of samples where the metric prediction matches the human label).
  • pe is the expected agreement by chance (calculated from the marginal distributions of both raters).

Cohen's Kappa has several desirable properties for evaluation metrics:

  • Chance correction -- Unlike raw accuracy, Kappa accounts for agreement that would occur by random guessing. A metric that produces random outputs will have κ0 regardless of class distribution.
  • Interpretable scale -- Values range from -1 (complete disagreement) through 0 (chance agreement) to 1 (perfect agreement). Generally, κ>0.6 indicates substantial agreement.
  • Class imbalance robustness -- By adjusting for expected chance agreement, Kappa provides a fair measure even when classes are imbalanced.

Correlation for Numeric Metrics

For metrics that produce continuous numeric scores, Pearson correlation coefficient is used:

r=i=1n(xix¯)(yiy¯)i=1n(xix¯)2i=1n(yiy¯)2

Where x represents the gold (human) labels and y represents the metric predictions.

Pearson correlation properties:

  • Linear relationship -- Measures the strength and direction of the linear relationship between metric outputs and human scores.
  • Scale invariant -- Values range from -1 (perfect negative correlation) through 0 (no linear relationship) to 1 (perfect positive correlation).
  • Normalized -- Unlike MSE, correlation is independent of the absolute scale of the metric outputs.

Correlation vs. Loss

Correlation and loss functions serve complementary roles:

Aspect Correlation Loss
Purpose Diagnostic measure of agreement Optimization objective
Used during Before/after optimization During optimization
Direction Higher is better Depends on loss type (higher accuracy is better; lower MSE is better)
Chance correction Yes (for Cohen's Kappa) No

The correlation provides a human-interpretable measure of metric quality, while loss functions provide the gradient signal that drives the optimizer.

Workflow

A typical baseline correlation workflow:

  1. Collect annotations -- Gather human annotations for the metric.
  2. Run baseline metric -- Evaluate the metric with its default prompts on the annotated data.
  3. Compute baseline correlation -- Calculate Cohen's Kappa (discrete) or Pearson r (numeric) between predictions and human labels.
  4. Optimize prompts -- Run prompt optimization.
  5. Compute post-optimization correlation -- Repeat the correlation measurement with optimized prompts.
  6. Compare -- The improvement in correlation quantifies the optimization's value.

Implemented By

See Also

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment