Principle:Snorkel team Snorkel Probabilistic Label Generation

Knowledge Sources	Data Programming: Creating Large Training Sets Quickly Training Complex Models with Multi-Task Weak Supervision
Domains	Weak_Supervision, Probabilistic_Inference
Last Updated	2026-02-14 20:00 GMT

Overview

A method for generating probabilistic (soft) or discrete (hard) labels from a trained label model by marginalizing over the learned LF accuracy parameters.

Description

Probabilistic Label Generation is the inference step of the data programming pipeline. After training the label model to learn LF accuracies, this step uses those learned parameters to produce labels for each data point. The output can be:

Probabilistic labels: A probability distribution over classes for each data point, capturing uncertainty in the labeling
Discrete labels: Hard label assignments obtained by taking the argmax of the probabilities, with configurable tie-breaking policies

Probabilistic labels are particularly valuable because they preserve uncertainty information that can be propagated to downstream model training via noise-aware loss functions (e.g., cross-entropy with soft targets).

Usage

Use this principle after training a label model. Generate probabilistic labels when training a downstream model that supports soft labels. Generate discrete labels when you need hard assignments for standard supervised learning or evaluation.

Theoretical Basis

Given trained parameters $μ$ and a new label matrix $L$ , the posterior probability of the true label is:

$P (Y = y | L_{i}) = \frac{P (Y = y) \prod_{j : L_{i, j} \neq - 1} P (λ_{j} = L_{i, j} | Y = y)}{\sum_{y^{'}} P (Y = y^{'}) \prod_{j : L_{i, j} \neq - 1} P (λ_{j} = L_{i, j} | Y = y^{'})}$

For discrete predictions with tie-breaking:

Abstain: Return -1 if max probabilities are tied
Random: Break ties deterministically using a hash function
True-random: Break ties with genuine randomness

Related Pages

Implemented By

Implementation:Snorkel_team_Snorkel_LabelModel_Predict

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment