Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Snorkel team Snorkel PandasLFApplier Apply

From Leeroopedia
Knowledge Sources
Domains Weak_Supervision, Data_Programming
Last Updated 2026-02-14 20:00 GMT

Overview

Concrete tool for applying labeling functions to a Pandas DataFrame to produce a label matrix, provided by the Snorkel library.

Description

The PandasLFApplier class applies a list of labeling functions to every row of a Pandas DataFrame, producing a dense label matrix as a NumPy array. It extends BaseLFApplier which handles sparse-to-dense matrix conversion.

The applier uses pandas.DataFrame.apply (single-process) internally, with optional tqdm progress bars. For large datasets, consider DaskLFApplier (multi-process) or SparkLFApplier (distributed).

Usage

Import this class when you have defined labeling functions and need to apply them to a Pandas DataFrame to generate the label matrix for subsequent analysis or label model training.

Code Reference

Source Location

  • Repository: snorkel
  • File: snorkel/labeling/apply/pandas.py
  • Lines: L51-113

Signature

class PandasLFApplier(BaseLFApplier):
    """LF applier for a Pandas DataFrame."""

    def apply(
        self,
        df: pd.DataFrame,
        progress_bar: bool = True,
        fault_tolerant: bool = False,
        return_meta: bool = False,
    ) -> Union[np.ndarray, Tuple[np.ndarray, ApplierMetadata]]:
        """
        Label Pandas DataFrame of data points with LFs.

        Args:
            df: Pandas DataFrame containing data points to be labeled by LFs.
            progress_bar: Display a progress bar.
            fault_tolerant: Output -1 if LF execution fails.
            return_meta: Return metadata from apply call.
        Returns:
            Matrix of labels emitted by LFs (shape [n_examples, n_lfs]).
            Optionally, ApplierMetadata with fault counts.
        """

Import

from snorkel.labeling import PandasLFApplier

I/O Contract

Inputs

Name Type Required Description
lfs List[LabelingFunction] Yes List of labeling functions (passed to constructor)
df pd.DataFrame Yes DataFrame where each row is a data point
progress_bar bool No Display tqdm progress bar (default True)
fault_tolerant bool No Return -1 on LF errors instead of raising (default False)
return_meta bool No Return ApplierMetadata with fault counts (default False)

Outputs

Name Type Description
L np.ndarray Label matrix of shape [n_examples, n_lfs] with values in {-1, 0, ..., k-1}
metadata ApplierMetadata Optional; contains fault_counts dict mapping LF names to error counts

Usage Examples

Basic Application

import pandas as pd
from snorkel.labeling import PandasLFApplier, labeling_function

ABSTAIN = -1
SPAM = 1

@labeling_function()
def lf_contains_buy(x):
    return SPAM if "buy" in x.text.lower() else ABSTAIN

@labeling_function()
def lf_short_text(x):
    return SPAM if len(x.text.split()) < 5 else ABSTAIN

# Create applier and apply to data
applier = PandasLFApplier(lfs=[lf_contains_buy, lf_short_text])

df = pd.DataFrame({"text": ["Buy now!", "Hello world", "Limited offer buy today"]})
L_train = applier.apply(df=df)

print(L_train.shape)  # (3, 2)
print(L_train)
# array([[ 1,  1],
#        [-1, -1],
#        [ 1, -1]])

Fault-Tolerant Application

# Apply with fault tolerance (LF errors return -1 instead of raising)
L_train, meta = applier.apply(
    df=df,
    fault_tolerant=True,
    return_meta=True,
)

# Check if any LFs had errors
print(meta.fault_counts)  # {'lf_contains_buy': 0, 'lf_short_text': 0}

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment