Implementation:Snorkel team Snorkel Generate Simple Label Matrix

Knowledge Sources	Snorkel
Domains	Weak_Supervision, Testing, Data_Generation
Last Updated	2026-02-14 20:38 GMT

Overview

Concrete tool for generating synthetic label matrices with known ground-truth parameters for testing and benchmarking weak supervision algorithms.

Description

The generate_simple_label_matrix function creates a complete synthetic weak supervision scenario: labeling function conditional probability tables, true labels, and the resulting label matrix. It models each labeling function as a noisy voter with a conditional probability table P(LF=l | Y=y), biases LFs towards being non-adversarial (correct more often than wrong), and supports configurable abstain rates. This is the primary testing utility for validating label model convergence and correctness without requiring real labeling functions or data.

Usage

Import this function when you need synthetic data for unit testing label model algorithms, benchmarking weak supervision approaches, or running reproducibility experiments. It is used extensively in Snorkel's own test suite to validate that the LabelModel correctly recovers latent accuracy parameters.

Code Reference

Source Location

Repository: Snorkel
File: snorkel/synthetic/synthetic_data.py
Lines: 1-59

Signature

def generate_simple_label_matrix(
    n: int, m: int, cardinality: int, abstain_multiplier: float = 1.0
) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
    """Generate a synthetic label matrix with true parameters and labels.

    This function generates a set of labeling function conditional probability tables,
    P(LF=l | Y=y), stored as a matrix P, and true labels Y, and then generates the
    resulting label matrix L.

    Parameters
    ----------
    n
        Number of data points
    m
        Number of labeling functions
    cardinality
        Cardinality of true labels (i.e. not including abstains)
    abstain_multiplier
        Factor to multiply the probability of abstaining by

    Returns
    -------
    Tuple[np.ndarray, np.ndarray, np.ndarray]
        A tuple containing the LF conditional probabilities P,
        the true labels Y, and the output label matrix L
    """

Import

from snorkel.synthetic.synthetic_data import generate_simple_label_matrix

I/O Contract

Inputs

Name	Type	Required	Description
n	int	Yes	Number of data points to generate
m	int	Yes	Number of labeling functions to simulate
cardinality	int	Yes	Number of true label classes (excluding abstain)
abstain_multiplier	float	No	Multiplier for abstain probability (default 1.0); higher values produce sparser label matrices

Outputs

Name	Type	Description
P	np.ndarray (m, cardinality+1, cardinality)	Y = y), where l=0 is abstain
Y	np.ndarray (n,)	True labels for each data point, sampled uniformly from [0, cardinality)
L	np.ndarray (n, m)	Label matrix where L[i, j] is the label assigned by LF j to data point i; -1 indicates abstain

Usage Examples

Basic Synthetic Label Matrix

from snorkel.synthetic.synthetic_data import generate_simple_label_matrix

# Generate a binary classification scenario
# 1000 data points, 10 labeling functions, 2 classes
P, Y, L = generate_simple_label_matrix(n=1000, m=10, cardinality=2)

print(f"Conditional probabilities shape: {P.shape}")  # (10, 3, 2)
print(f"True labels shape: {Y.shape}")                 # (1000,)
print(f"Label matrix shape: {L.shape}")                # (1000, 10)
print(f"Abstain rate: {(L == -1).mean():.2f}")

Testing LabelModel Convergence

import numpy as np
from snorkel.synthetic.synthetic_data import generate_simple_label_matrix
from snorkel.labeling.model import LabelModel

# Generate sparse label matrix (high abstain rate)
P, Y, L = generate_simple_label_matrix(
    n=5000, m=10, cardinality=2, abstain_multiplier=3.0
)

# Train label model on synthetic data
label_model = LabelModel(cardinality=2)
label_model.fit(L, n_epochs=500)

# Evaluate against known ground truth
predictions = label_model.predict(L)
accuracy = (predictions == Y).mean()
print(f"Label model accuracy: {accuracy:.3f}")

Related Pages

Heuristic:Snorkel_team_Snorkel_Minimum_Three_LFs

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment