Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Dotnet Machinelearning Binary Model Evaluation

From Leeroopedia


Knowledge Sources
Domains Machine Learning, Model Evaluation, Statistics
Last Updated 2026-02-09 00:00 GMT

Overview

Model evaluation quantifies the predictive performance of a trained classifier on held-out data using metrics that measure different aspects of prediction quality.

Description

After training a binary classification model, it is essential to measure how well it generalizes to unseen data. Evaluation uses the test set (data never seen during training) to compute metrics that reflect different facets of model quality.

The key binary classification metrics are:

  • Accuracy: the fraction of all predictions that are correct. Simple and intuitive, but misleading when classes are imbalanced (e.g., 95% accuracy is trivial if 95% of examples are negative).
  • AUC (Area Under the ROC Curve): measures the model's ability to rank positive examples higher than negative examples across all possible decision thresholds. AUC = 1.0 indicates perfect ranking; AUC = 0.5 indicates random chance. AUC is threshold-independent and robust to class imbalance.
  • F1 Score: the harmonic mean of precision and recall, providing a single metric that balances false positives and false negatives. F1 is especially useful when both types of errors are costly.
  • Precision (positive predictive value): of all instances predicted positive, how many are truly positive.
  • Recall (sensitivity, true positive rate): of all truly positive instances, how many are correctly predicted.
  • Log-loss (binary cross-entropy): measures the quality of probabilistic predictions. Lower is better. Penalizes confident incorrect predictions more heavily than uncertain ones.

Cross-validation provides a more robust performance estimate by partitioning the data into k folds. Each fold serves as the test set exactly once while the remaining k-1 folds serve as training data. The final metric is the average across all k runs, reducing the variance of the estimate.

Usage

Use single train-test evaluation for quick assessment during development. Use cross-validation for final model selection and when reporting results, especially on smaller datasets where a single split may yield unstable estimates. Report AUC as the primary metric for ranking quality, F1 when false positives and false negatives are equally important, and log-loss when calibrated probabilities matter.

Theoretical Basis

Confusion matrix for binary classification:

                    Predicted Positive    Predicted Negative
Actual Positive         TP                     FN
Actual Negative         FP                     TN

Derived metrics:

Accuracy  = (TP + TN) / (TP + TN + FP + FN)
Precision = TP / (TP + FP)
Recall    = TP / (TP + FN)
F1        = 2 * (Precision * Recall) / (Precision + Recall)
         = 2 * TP / (2 * TP + FP + FN)

ROC curve and AUC: The ROC curve plots True Positive Rate (Recall) vs. False Positive Rate (FP / (FP + TN)) as the decision threshold varies from 0 to 1. AUC is the integral under this curve:

AUC = integral_0^1 TPR(FPR) dFPR

Interpretation:
  AUC = P(score(positive) > score(negative))
  AUC = 1.0  -> perfect separation
  AUC = 0.5  -> random classifier
  AUC < 0.5  -> worse than random (predictions inverted)

Log-loss (binary cross-entropy):

LogLoss = -(1/n) * sum_i [ y_i * log(p_i) + (1 - y_i) * log(1 - p_i) ]

where p_i = predicted probability of positive class
      y_i = true label (0 or 1)

Perfect: LogLoss -> 0 (all predictions are confident and correct)
Random:  LogLoss = log(2) ≈ 0.693 (all predictions are 0.5)

Cross-validation with k folds:

For fold j = 1 to k:
  TestSet_j  = Partition_j
  TrainSet_j = D \ Partition_j
  Model_j    = Train(TrainSet_j)
  Metric_j   = Evaluate(Model_j, TestSet_j)

FinalMetric = (1/k) * sum_j Metric_j
Std         = sqrt((1/k) * sum_j (Metric_j - FinalMetric)^2)

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment