Implementation:Snorkel team Snorkel LFAnalysis Summary
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Weak_Supervision, Data_Quality, Statistics |
| Last Updated | 2026-02-14 20:00 GMT |
Overview
Concrete tool for computing per-LF statistics (coverage, overlap, conflict, accuracy) from a label matrix, provided by the Snorkel library.
Description
The LFAnalysis class accepts a label matrix and computes diagnostic statistics for each labeling function. It uses sparse matrix operations (scipy.sparse) for efficient computation and produces summary DataFrames with per-LF metrics.
Key methods:
- label_coverage(): Fraction of data points with at least one label
- label_overlap(): Fraction of data points labeled by more than one LF
- label_conflict(): Fraction of data points with conflicting labels
- lf_summary(): Complete DataFrame with all per-LF statistics
- lf_empirical_accuracies(): Accuracy against gold labels (if available)
Usage
Import this class after applying labeling functions and obtaining a label matrix. Use it to inspect LF quality before training a label model.
Code Reference
Source Location
- Repository: snorkel
- File: snorkel/labeling/analysis.py
- Lines: L15-377
Signature
class LFAnalysis:
def __init__(
self,
L: np.ndarray,
lfs: Optional[List[LabelingFunction]] = None,
) -> None:
"""
Args:
L: Label matrix [n_examples, n_lfs] with values in {-1, 0, ..., k-1}.
lfs: Labeling functions used to generate L (for naming in summary).
"""
def label_coverage(self) -> float: ...
def label_overlap(self) -> float: ...
def label_conflict(self) -> float: ...
def lf_polarities(self) -> List[List[int]]: ...
def lf_coverages(self) -> np.ndarray: ...
def lf_overlaps(self, normalize_by_coverage: bool = False) -> np.ndarray: ...
def lf_conflicts(self, normalize_by_overlaps: bool = False) -> np.ndarray: ...
def lf_empirical_accuracies(self, Y: np.ndarray) -> np.ndarray: ...
def lf_empirical_probs(self, Y: np.ndarray, k: int) -> np.ndarray: ...
def lf_summary(
self,
Y: Optional[np.ndarray] = None,
est_weights: Optional[np.ndarray] = None,
) -> DataFrame: ...
Import
from snorkel.labeling import LFAnalysis
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| L | np.ndarray | Yes | Label matrix [n_examples, n_lfs] with values in {-1, 0, ..., k-1} |
| lfs | Optional[List[LabelingFunction]] | No | LFs for naming columns in summary |
| Y | Optional[np.ndarray] | No | Gold labels for empirical accuracy (passed to lf_summary/lf_empirical_accuracies) |
| est_weights | Optional[np.ndarray] | No | Learned LF weights from label model (passed to lf_summary) |
Outputs
| Name | Type | Description |
|---|---|---|
| label_coverage() | float | Fraction of data points with at least one label |
| label_overlap() | float | Fraction of data points with 2+ labels |
| label_conflict() | float | Fraction of data points with conflicting labels |
| lf_summary() | pd.DataFrame | Summary with columns: Polarity, Coverage, Overlaps, Conflicts, Correct, Incorrect, Emp. Acc. |
Usage Examples
Basic LF Analysis
import numpy as np
from snorkel.labeling import LFAnalysis
# Label matrix from PandasLFApplier
L_train = np.array([
[-1, 0, 0],
[-1, -1, -1],
[1, 0, -1],
[-1, 0, -1],
[0, 0, 0],
])
analysis = LFAnalysis(L=L_train)
# Global metrics
print(f"Coverage: {analysis.label_coverage():.2f}") # 0.80
print(f"Overlap: {analysis.label_overlap():.2f}") # 0.60
print(f"Conflict: {analysis.label_conflict():.2f}") # 0.20
# Per-LF metrics
print(analysis.lf_coverages()) # [0.4, 0.8, 0.4]
print(analysis.lf_polarities()) # [[0, 1], [0], [0]]
Summary with Gold Labels
# With named LFs and gold labels for dev set accuracy
lfs = [lf_keyword_check, lf_short_text, lf_sender_check]
Y_dev = np.array([1, 0, 1, 0, 0])
analysis = LFAnalysis(L=L_train, lfs=lfs)
summary = analysis.lf_summary(Y=Y_dev)
print(summary)
# Columns: j, Polarity, Coverage, Overlaps, Conflicts, Correct, Incorrect, Emp. Acc.
Related Pages
Implements Principle
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment