Implementation:Explodinggradients Ragas Metric Get Correlation
Metric Get Correlation
Metric Get Correlation implements the Metric Baseline Correlation principle in the Ragas evaluation toolkit. The get_correlation() method is defined on both DiscreteMetric and NumericMetric, each using a statistical measure appropriate to its output type.
Source Locations
- DiscreteMetric.get_correlation:
src/ragas/metrics/discrete.py, Lines 75-89 - NumericMetric.get_correlation:
src/ragas/metrics/numeric.py, Lines 73-93
Import
from ragas.metrics import DiscreteMetric
from ragas.metrics.numeric import NumericMetric
DiscreteMetric.get_correlation
Signature (Lines 75-89)
def get_correlation(
self,
gold_labels: List[str],
predictions: List[str],
) -> float
Parameters
| Parameter | Type | Description |
|---|---|---|
gold_labels |
List[str] |
Human-annotated ground-truth labels (categorical strings). |
predictions |
List[str] |
Metric-predicted labels (categorical strings). |
Return Value
Returns a float representing Cohen's Kappa score, ranging from -1 (complete disagreement) to 1 (perfect agreement), with 0 indicating chance-level agreement.
Implementation
def get_correlation(
self, gold_labels: t.List[str], predictions: t.List[str]
) -> float:
try:
from sklearn.metrics import cohen_kappa_score
except ImportError:
raise ImportError(
"scikit-learn is required for correlation calculation. "
"Please install it with `pip install scikit-learn`."
)
return cohen_kappa_score(gold_labels, predictions)
Uses scikit-learn's cohen_kappa_score which:
- Computes the observed agreement and expected chance agreement.
- Returns the chance-corrected agreement coefficient.
- Handles multi-class categorical labels (not just binary).
External Dependency
Requires scikit-learn. If not installed, raises ImportError with installation instructions.
Usage Example
from ragas.metrics import DiscreteMetric
metric = DiscreteMetric(
name="quality_check",
prompt="Check quality: {response}",
allowed_values=["pass", "fail"],
)
gold = ["pass", "fail", "pass", "pass", "fail"]
preds = ["pass", "fail", "fail", "pass", "fail"]
kappa = metric.get_correlation(gold_labels=gold, predictions=preds)
print(f"Cohen's Kappa: {kappa:.3f}")
# Positive kappa indicates above-chance agreement
NumericMetric.get_correlation
Signature (Lines 73-93)
def get_correlation(
self,
gold_labels: List[str],
predictions: List[str],
) -> float
Parameters
| Parameter | Type | Description |
|---|---|---|
gold_labels |
List[str] |
Human-annotated ground-truth scores (as strings, converted to floats internally). |
predictions |
List[str] |
Metric-predicted scores (as strings, converted to floats internally). |
Return Value
Returns a float representing the Pearson correlation coefficient, ranging from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no linear correlation.
Implementation
def get_correlation(
self, gold_labels: t.List[str], predictions: t.List[str]
) -> float:
try:
from scipy.stats import pearsonr
except ImportError:
raise ImportError(
"scipy is required for correlation calculation. "
"Please install it with `pip install scipy`."
)
# Convert strings to floats for correlation calculation
gold_floats = [float(x) for x in gold_labels]
pred_floats = [float(x) for x in predictions]
result = pearsonr(gold_floats, pred_floats)
# pearsonr returns (correlation, p-value) tuple
correlation = t.cast(float, result[0])
return correlation
Key details:
- Converts string inputs to floats before computing correlation.
- Uses
scipy.stats.pearsonrwhich returns a tuple of (correlation, p-value); only the correlation coefficient is returned. - The p-value (indicating statistical significance) is discarded.
External Dependency
Requires scipy. If not installed, raises ImportError with installation instructions.
Usage Example
from ragas.metrics.numeric import NumericMetric
metric = NumericMetric(
name="relevance_score",
prompt="Rate relevance of {response} to {user_input}",
allowed_values=(0.0, 1.0),
)
gold = ["0.9", "0.3", "0.7", "0.5", "0.8"]
preds = ["0.85", "0.4", "0.65", "0.55", "0.75"]
r = metric.get_correlation(gold_labels=gold, predictions=preds)
print(f"Pearson r: {r:.3f}")
# High positive r indicates strong linear agreement
Comparison of Methods
| Aspect | DiscreteMetric | NumericMetric |
|---|---|---|
| Statistical measure | Cohen's Kappa | Pearson correlation |
| Input type | Categorical strings | Numeric strings (converted to float) |
| Range | [-1, 1] | [-1, 1] |
| Chance correction | Yes | N/A (measures linear relationship) |
| External dependency | scikit-learn | scipy |
| Best for | Pass/fail, multi-class labels | Continuous scores (e.g., 0.0-1.0) |
Class Context
Both DiscreteMetric and NumericMetric inherit from SimpleLLMMetric:
SimpleLLMMetric
├── DiscreteMetric (+ DiscreteValidator)
│ └── get_correlation() -> Cohen's Kappa
└── NumericMetric (+ NumericValidator)
└── get_correlation() -> Pearson r
The get_correlation method is not abstract in the parent class; it is defined independently on each subclass with the appropriate statistical measure.
Implements
See Also
- Loss Classes -- Complementary measure used during optimization itself.
- MetricAnnotation Class -- Provides the human labels for correlation measurement.
- GeneticOptimizer Class -- Optimizer whose output quality is measured by correlation.
- Environment:Explodinggradients_Ragas_Optional_Metrics_Environment