Implementation:Snorkel team Snorkel PandasLFApplier Apply
| Knowledge Sources | |
|---|---|
| Domains | Weak_Supervision, Data_Programming |
| Last Updated | 2026-02-14 20:00 GMT |
Overview
Concrete tool for applying labeling functions to a Pandas DataFrame to produce a label matrix, provided by the Snorkel library.
Description
The PandasLFApplier class applies a list of labeling functions to every row of a Pandas DataFrame, producing a dense label matrix as a NumPy array. It extends BaseLFApplier which handles sparse-to-dense matrix conversion.
The applier uses pandas.DataFrame.apply (single-process) internally, with optional tqdm progress bars. For large datasets, consider DaskLFApplier (multi-process) or SparkLFApplier (distributed).
Usage
Import this class when you have defined labeling functions and need to apply them to a Pandas DataFrame to generate the label matrix for subsequent analysis or label model training.
Code Reference
Source Location
- Repository: snorkel
- File: snorkel/labeling/apply/pandas.py
- Lines: L51-113
Signature
class PandasLFApplier(BaseLFApplier):
"""LF applier for a Pandas DataFrame."""
def apply(
self,
df: pd.DataFrame,
progress_bar: bool = True,
fault_tolerant: bool = False,
return_meta: bool = False,
) -> Union[np.ndarray, Tuple[np.ndarray, ApplierMetadata]]:
"""
Label Pandas DataFrame of data points with LFs.
Args:
df: Pandas DataFrame containing data points to be labeled by LFs.
progress_bar: Display a progress bar.
fault_tolerant: Output -1 if LF execution fails.
return_meta: Return metadata from apply call.
Returns:
Matrix of labels emitted by LFs (shape [n_examples, n_lfs]).
Optionally, ApplierMetadata with fault counts.
"""
Import
from snorkel.labeling import PandasLFApplier
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| lfs | List[LabelingFunction] | Yes | List of labeling functions (passed to constructor) |
| df | pd.DataFrame | Yes | DataFrame where each row is a data point |
| progress_bar | bool | No | Display tqdm progress bar (default True) |
| fault_tolerant | bool | No | Return -1 on LF errors instead of raising (default False) |
| return_meta | bool | No | Return ApplierMetadata with fault counts (default False) |
Outputs
| Name | Type | Description |
|---|---|---|
| L | np.ndarray | Label matrix of shape [n_examples, n_lfs] with values in {-1, 0, ..., k-1} |
| metadata | ApplierMetadata | Optional; contains fault_counts dict mapping LF names to error counts |
Usage Examples
Basic Application
import pandas as pd
from snorkel.labeling import PandasLFApplier, labeling_function
ABSTAIN = -1
SPAM = 1
@labeling_function()
def lf_contains_buy(x):
return SPAM if "buy" in x.text.lower() else ABSTAIN
@labeling_function()
def lf_short_text(x):
return SPAM if len(x.text.split()) < 5 else ABSTAIN
# Create applier and apply to data
applier = PandasLFApplier(lfs=[lf_contains_buy, lf_short_text])
df = pd.DataFrame({"text": ["Buy now!", "Hello world", "Limited offer buy today"]})
L_train = applier.apply(df=df)
print(L_train.shape) # (3, 2)
print(L_train)
# array([[ 1, 1],
# [-1, -1],
# [ 1, -1]])
Fault-Tolerant Application
# Apply with fault tolerance (LF errors return -1 instead of raising)
L_train, meta = applier.apply(
df=df,
fault_tolerant=True,
return_meta=True,
)
# Check if any LFs had errors
print(meta.fault_counts) # {'lf_contains_buy': 0, 'lf_short_text': 0}