Principle:Snorkel team Snorkel Multitask Evaluation Prediction
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Multi_Task_Learning, Inference |
| Last Updated | 2026-02-14 20:00 GMT |
Overview
A methodology for evaluating multi-task models by computing per-task metrics across datasets and splits, with support for probabilistic label training.
Description
Multitask Evaluation and Prediction provides two key capabilities:
- Scoring: Computing per-task, per-dataset, per-split metrics (accuracy, F1, etc.) in a structured format
- Prediction: Generating probabilities and optional hard predictions for each task
The evaluation supports label remapping (evaluating one tasks predictions against another tasks labels), which is essential for slice-aware models where prediction labels map to the base task head.
Additionally, Snorkel provides cross_entropy_with_probs for training with probabilistic (soft) labels from the label model, enabling end-to-end weak supervision without hard label discretization.
Usage
Use this principle after training a MultitaskClassifier to evaluate performance and generate predictions. Use label remapping when evaluating slice-aware models.
Theoretical Basis
Per-task evaluation computes:
For probabilistic label training, the soft cross-entropy is:
where is the probabilistic label (from label model) and is the models predicted probability.