Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Snorkel team Snorkel LFAnalysis Summary

From Leeroopedia
Knowledge Sources
Domains Weak_Supervision, Data_Quality, Statistics
Last Updated 2026-02-14 20:00 GMT

Overview

Concrete tool for computing per-LF statistics (coverage, overlap, conflict, accuracy) from a label matrix, provided by the Snorkel library.

Description

The LFAnalysis class accepts a label matrix and computes diagnostic statistics for each labeling function. It uses sparse matrix operations (scipy.sparse) for efficient computation and produces summary DataFrames with per-LF metrics.

Key methods:

  • label_coverage(): Fraction of data points with at least one label
  • label_overlap(): Fraction of data points labeled by more than one LF
  • label_conflict(): Fraction of data points with conflicting labels
  • lf_summary(): Complete DataFrame with all per-LF statistics
  • lf_empirical_accuracies(): Accuracy against gold labels (if available)

Usage

Import this class after applying labeling functions and obtaining a label matrix. Use it to inspect LF quality before training a label model.

Code Reference

Source Location

  • Repository: snorkel
  • File: snorkel/labeling/analysis.py
  • Lines: L15-377

Signature

class LFAnalysis:
    def __init__(
        self,
        L: np.ndarray,
        lfs: Optional[List[LabelingFunction]] = None,
    ) -> None:
        """
        Args:
            L: Label matrix [n_examples, n_lfs] with values in {-1, 0, ..., k-1}.
            lfs: Labeling functions used to generate L (for naming in summary).
        """

    def label_coverage(self) -> float: ...
    def label_overlap(self) -> float: ...
    def label_conflict(self) -> float: ...
    def lf_polarities(self) -> List[List[int]]: ...
    def lf_coverages(self) -> np.ndarray: ...
    def lf_overlaps(self, normalize_by_coverage: bool = False) -> np.ndarray: ...
    def lf_conflicts(self, normalize_by_overlaps: bool = False) -> np.ndarray: ...
    def lf_empirical_accuracies(self, Y: np.ndarray) -> np.ndarray: ...
    def lf_empirical_probs(self, Y: np.ndarray, k: int) -> np.ndarray: ...
    def lf_summary(
        self,
        Y: Optional[np.ndarray] = None,
        est_weights: Optional[np.ndarray] = None,
    ) -> DataFrame: ...

Import

from snorkel.labeling import LFAnalysis

I/O Contract

Inputs

Name Type Required Description
L np.ndarray Yes Label matrix [n_examples, n_lfs] with values in {-1, 0, ..., k-1}
lfs Optional[List[LabelingFunction]] No LFs for naming columns in summary
Y Optional[np.ndarray] No Gold labels for empirical accuracy (passed to lf_summary/lf_empirical_accuracies)
est_weights Optional[np.ndarray] No Learned LF weights from label model (passed to lf_summary)

Outputs

Name Type Description
label_coverage() float Fraction of data points with at least one label
label_overlap() float Fraction of data points with 2+ labels
label_conflict() float Fraction of data points with conflicting labels
lf_summary() pd.DataFrame Summary with columns: Polarity, Coverage, Overlaps, Conflicts, Correct, Incorrect, Emp. Acc.

Usage Examples

Basic LF Analysis

import numpy as np
from snorkel.labeling import LFAnalysis

# Label matrix from PandasLFApplier
L_train = np.array([
    [-1, 0, 0],
    [-1, -1, -1],
    [1, 0, -1],
    [-1, 0, -1],
    [0, 0, 0],
])

analysis = LFAnalysis(L=L_train)

# Global metrics
print(f"Coverage: {analysis.label_coverage():.2f}")    # 0.80
print(f"Overlap: {analysis.label_overlap():.2f}")      # 0.60
print(f"Conflict: {analysis.label_conflict():.2f}")    # 0.20

# Per-LF metrics
print(analysis.lf_coverages())    # [0.4, 0.8, 0.4]
print(analysis.lf_polarities())   # [[0, 1], [0], [0]]

Summary with Gold Labels

# With named LFs and gold labels for dev set accuracy
lfs = [lf_keyword_check, lf_short_text, lf_sender_check]
Y_dev = np.array([1, 0, 1, 0, 0])

analysis = LFAnalysis(L=L_train, lfs=lfs)
summary = analysis.lf_summary(Y=Y_dev)
print(summary)
# Columns: j, Polarity, Coverage, Overlaps, Conflicts, Correct, Incorrect, Emp. Acc.

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment