Principle:Snorkel team Snorkel Multitask Model Training

Knowledge Sources	An Overview of Multi-Task Learning in Deep Neural Networks Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices
Domains	Multi_Task_Learning, Training, Deep_Learning
Last Updated	2026-02-14 20:00 GMT

Overview

A training procedure that jointly optimizes a neural network across multiple related tasks, sharing representations while allowing task-specific specialization.

Description

Multi-task Model Training is the optimization step for models with multiple task heads. In the context of Snorkel, this is used both for general multi-task classification and for slice-aware training. The trainer:

Iterates over batches drawn from multiple dataloaders
Computes per-task losses using task-specific loss functions
Aggregates losses across tasks
Applies gradient clipping and optimization
Supports configurable batch scheduling (shuffled or sequential across tasks)
Optionally logs metrics and checkpoints the best model

The training loop handles the complexity of multi-task optimization: different tasks may have different numbers of examples, different loss scales, and different convergence rates.

Usage

Use this principle when training any MultitaskClassifier or SliceAwareClassifier. Configure training hyperparameters (epochs, learning rate, optimizer) and optionally enable logging and checkpointing.

Theoretical Basis

Multi-task training minimizes the sum of task losses:

$ℒ_{total} = \sum_{t \in 𝒯} ℒ_{t} (θ_{shared}, θ_{t})$

where $θ_{shared}$ are shared parameters and $θ_{t}$ are task-specific parameters. The gradient with respect to shared parameters receives contributions from all tasks:

$\nabla_{θ_{shared}} ℒ_{total} = \sum_{t} \nabla_{θ_{shared}} ℒ_{t}$

This encourages the shared representation to be useful across all tasks while allowing specialization in task-specific heads.

Related Pages

Implemented By

Implementation:Snorkel_team_Snorkel_Trainer_Fit

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment