Implementation:Snorkel team Snorkel LFAnalysis Summary

Knowledge Sources	Snorkel Snorkel API Docs
Domains	Weak_Supervision, Data_Quality, Statistics
Last Updated	2026-02-14 20:00 GMT

Overview

Concrete tool for computing per-LF statistics (coverage, overlap, conflict, accuracy) from a label matrix, provided by the Snorkel library.

Description

The LFAnalysis class accepts a label matrix and computes diagnostic statistics for each labeling function. It uses sparse matrix operations (scipy.sparse) for efficient computation and produces summary DataFrames with per-LF metrics.

Key methods:

label_coverage(): Fraction of data points with at least one label
label_overlap(): Fraction of data points labeled by more than one LF
label_conflict(): Fraction of data points with conflicting labels
lf_summary(): Complete DataFrame with all per-LF statistics
lf_empirical_accuracies(): Accuracy against gold labels (if available)

Usage

Import this class after applying labeling functions and obtaining a label matrix. Use it to inspect LF quality before training a label model.

Code Reference

Source Location

Repository: snorkel
File: snorkel/labeling/analysis.py
Lines: L15-377

Signature

class LFAnalysis:
    def __init__(
        self,
        L: np.ndarray,
        lfs: Optional[List[LabelingFunction]] = None,
    ) -> None:
        """
        Args:
            L: Label matrix [n_examples, n_lfs] with values in {-1, 0, ..., k-1}.
            lfs: Labeling functions used to generate L (for naming in summary).
        """

    def label_coverage(self) -> float: ...
    def label_overlap(self) -> float: ...
    def label_conflict(self) -> float: ...
    def lf_polarities(self) -> List[List[int]]: ...
    def lf_coverages(self) -> np.ndarray: ...
    def lf_overlaps(self, normalize_by_coverage: bool = False) -> np.ndarray: ...
    def lf_conflicts(self, normalize_by_overlaps: bool = False) -> np.ndarray: ...
    def lf_empirical_accuracies(self, Y: np.ndarray) -> np.ndarray: ...
    def lf_empirical_probs(self, Y: np.ndarray, k: int) -> np.ndarray: ...
    def lf_summary(
        self,
        Y: Optional[np.ndarray] = None,
        est_weights: Optional[np.ndarray] = None,
    ) -> DataFrame: ...

Import

from snorkel.labeling import LFAnalysis

I/O Contract

Inputs

Name	Type	Required	Description
L	np.ndarray	Yes	Label matrix [n_examples, n_lfs] with values in {-1, 0, ..., k-1}
lfs	Optional[List[LabelingFunction]]	No	LFs for naming columns in summary
Y	Optional[np.ndarray]	No	Gold labels for empirical accuracy (passed to lf_summary/lf_empirical_accuracies)
est_weights	Optional[np.ndarray]	No	Learned LF weights from label model (passed to lf_summary)

Outputs

Name	Type	Description
label_coverage()	float	Fraction of data points with at least one label
label_overlap()	float	Fraction of data points with 2+ labels
label_conflict()	float	Fraction of data points with conflicting labels
lf_summary()	pd.DataFrame	Summary with columns: Polarity, Coverage, Overlaps, Conflicts, Correct, Incorrect, Emp. Acc.

Usage Examples

Basic LF Analysis

import numpy as np
from snorkel.labeling import LFAnalysis

# Label matrix from PandasLFApplier
L_train = np.array([
    [-1, 0, 0],
    [-1, -1, -1],
    [1, 0, -1],
    [-1, 0, -1],
    [0, 0, 0],
])

analysis = LFAnalysis(L=L_train)

# Global metrics
print(f"Coverage: {analysis.label_coverage():.2f}")    # 0.80
print(f"Overlap: {analysis.label_overlap():.2f}")      # 0.60
print(f"Conflict: {analysis.label_conflict():.2f}")    # 0.20

# Per-LF metrics
print(analysis.lf_coverages())    # [0.4, 0.8, 0.4]
print(analysis.lf_polarities())   # [[0, 1], [0], [0]]

Summary with Gold Labels

# With named LFs and gold labels for dev set accuracy
lfs = [lf_keyword_check, lf_short_text, lf_sender_check]
Y_dev = np.array([1, 0, 1, 0, 0])

analysis = LFAnalysis(L=L_train, lfs=lfs)
summary = analysis.lf_summary(Y=Y_dev)
print(summary)
# Columns: j, Polarity, Coverage, Overlaps, Conflicts, Correct, Incorrect, Emp. Acc.

Related Pages

Implements Principle

Principle:Snorkel_team_Snorkel_Labeling_Function_Analysis

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment