Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Snorkel team Snorkel Precision Init Prior

From Leeroopedia
Knowledge Sources
Domains Weak_Supervision, Optimization
Last Updated 2026-02-14 21:00 GMT

Overview

The LabelModel's `prec_init` parameter (default 0.7) sets the prior assumption for labeling function precision, strongly influencing convergence behavior and final learned accuracy estimates.

Description

When training the LabelModel, the conditional probability parameters (μ) are initialized based on `prec_init`, which represents the prior belief about how accurate each labeling function is: P(LF outputs correct label | LF does not abstain). The default of 0.7 means "assume each LF is correct 70% of the time" before seeing any data. This initialization is then scaled by a random scalar from [0, 1) to break symmetry.

Usage

Adjust `prec_init` when the default assumption of 70% LF accuracy does not match your domain. If your LFs are known to be highly accurate (e.g., hand-crafted rules in a well-understood domain), increase toward 0.9. If your LFs are noisy or exploratory, decrease toward 0.5-0.6. Values at or below 0.5 indicate LFs worse than random chance (for binary classification).

The Insight (Rule of Thumb)

  • Action: Set `prec_init` in `LabelModel.fit()` kwargs based on expected LF quality.
  • Value: Default is 0.7. Range: (0.5, 1.0) for binary classification.
  • Trade-off: Too high: model may take longer to correct inaccurate LFs. Too low: model starts with pessimistic priors that may slow convergence. The value interacts with `n_epochs` and `lr` -- higher prec_init may need fewer epochs to converge.
  • Interaction: The initialization is scaled by `np.random.random()` (a single scalar), so the actual initial values vary between runs unless a seed is set.

Reasoning

The LabelModel uses gradient descent to learn the mu parameters starting from the prec_init-based initialization. The optimization landscape has multiple local optima, so the starting point matters significantly. The 0.7 default represents a moderate "better than random" assumption that works well across many weak supervision settings.

Code evidence from `label_model.py:61`:

    prec_init: Union[float, List[float], np.ndarray, torch.Tensor] = 0.7

Random scaling from `label_model.py:308-310`:

        # Initialize randomly based on self.mu_init
        self.mu = nn.Parameter(
            self.mu_init.clone() * np.random.random()
        ).float()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment