Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Snorkel team Snorkel Multitask Evaluation Prediction

From Leeroopedia
Knowledge Sources
Domains Evaluation, Multi_Task_Learning, Inference
Last Updated 2026-02-14 20:00 GMT

Overview

A methodology for evaluating multi-task models by computing per-task metrics across datasets and splits, with support for probabilistic label training.

Description

Multitask Evaluation and Prediction provides two key capabilities:

  • Scoring: Computing per-task, per-dataset, per-split metrics (accuracy, F1, etc.) in a structured format
  • Prediction: Generating probabilities and optional hard predictions for each task

The evaluation supports label remapping (evaluating one tasks predictions against another tasks labels), which is essential for slice-aware models where prediction labels map to the base task head.

Additionally, Snorkel provides cross_entropy_with_probs for training with probabilistic (soft) labels from the label model, enabling end-to-end weak supervision without hard label discretization.

Usage

Use this principle after training a MultitaskClassifier to evaluate performance and generate predictions. Use label remapping when evaluating slice-aware models.

Theoretical Basis

Per-task evaluation computes:

metrict,d,s=f(Y^t,Yt)for task t,dataset d,split s

For probabilistic label training, the soft cross-entropy is:

soft=icpi,clogqi,c

where pi,c is the probabilistic label (from label model) and qi,c is the models predicted probability.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment