Principle:Snorkel team Snorkel Multitask Evaluation Prediction

Knowledge Sources	An Overview of Multi-Task Learning in Deep Neural Networks
Domains	Evaluation, Multi_Task_Learning, Inference
Last Updated	2026-02-14 20:00 GMT

Overview

A methodology for evaluating multi-task models by computing per-task metrics across datasets and splits, with support for probabilistic label training.

Description

Multitask Evaluation and Prediction provides two key capabilities:

Scoring: Computing per-task, per-dataset, per-split metrics (accuracy, F1, etc.) in a structured format
Prediction: Generating probabilities and optional hard predictions for each task

The evaluation supports label remapping (evaluating one tasks predictions against another tasks labels), which is essential for slice-aware models where prediction labels map to the base task head.

Additionally, Snorkel provides cross_entropy_with_probs for training with probabilistic (soft) labels from the label model, enabling end-to-end weak supervision without hard label discretization.

Usage

Use this principle after training a MultitaskClassifier to evaluate performance and generate predictions. Use label remapping when evaluating slice-aware models.

Theoretical Basis

Per-task evaluation computes:

${metric}_{t, d, s} = f ({\hat{Y}}_{t}, Y_{t}) for task t, dataset d, split s$

For probabilistic label training, the soft cross-entropy is:

$ℒ_{soft} = - \sum_{i} \sum_{c} p_{i, c} \log q_{i, c}$

where $p_{i, c}$ is the probabilistic label (from label model) and $q_{i, c}$ is the models predicted probability.

Related Pages

Implemented By

Implementation:Snorkel_team_Snorkel_MultitaskClassifier_Score

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment