Environment:Snorkel team Snorkel PyTorch
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Deep_Learning |
| Last Updated | 2026-02-14 21:00 GMT |
Overview
Python 3.11+ environment with PyTorch >= 1.2.0, NumPy, SciPy, and core scientific computing stack required for all Snorkel label modeling and classification workflows.
Description
This environment provides the core runtime for Snorkel's PyTorch-dependent modules: the LabelModel (a generative model trained via gradient descent), the MultitaskClassifier (a multi-task neural network), and all classification/training infrastructure. PyTorch is used for tensor operations, autograd, nn.Module subclassing, DataParallel, and optimizer/scheduler registries.
The environment includes the full scientific Python stack: NumPy for matrix operations, SciPy for sparse matrix handling, pandas for data manipulation, scikit-learn for metrics, and TensorBoard for visualization.
Usage
Use this environment for any Snorkel workflow that involves the LabelModel (weak supervision pipeline) or MultitaskClassifier/SliceAwareClassifier (slice-aware training, multitask classification). The core labeling and augmentation modules (LabelingFunction, TransformationFunction, appliers) work without this environment if only using CPU-based operations, but the label model and classification pipelines require PyTorch.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux, macOS, Windows | Linux recommended for CUDA support |
| Python | >= 3.11 | Enforced in setup.py: `python_requires=">=3.11"` |
| Hardware | CPU (default) or NVIDIA GPU | LabelModel defaults to CPU; MultitaskClassifier defaults to GPU 0 |
| Disk | 2GB+ | For package installation and model checkpoints |
Dependencies
System Packages
- `python` >= 3.11
Python Packages (Essential)
- `torch` >= 1.2.0
- `numpy` >= 1.24.0
- `scipy` >= 1.2.0
- `pandas` >= 1.0.0
- `scikit-learn` >= 0.20.2
- `tqdm` >= 4.33.0
- `munkres` >= 1.0.6
- `networkx` >= 2.2
- `tensorboard` >= 2.13.0
- `protobuf` >= 3.19.6
Credentials
No environment variables or credentials are required. All configuration is done through Python config objects (NamedTuples/Config classes). The codebase contains zero `os.environ` calls.
Quick Install
# Install Snorkel with all essential dependencies
pip install snorkel
# Or install dependencies manually
pip install torch>=1.2.0 numpy>=1.24.0 scipy>=1.2.0 pandas>=1.0.0 scikit-learn>=0.20.2 tqdm>=4.33.0 munkres>=1.0.6 networkx>=2.2 tensorboard>=2.13.0 protobuf>=3.19.6
Code Evidence
CUDA validation from `label_model.py:141-143`:
# Confirm that cuda is available if config is using CUDA
if self.config.device != "cpu" and not torch.cuda.is_available():
raise ValueError("device=cuda but CUDA not available.")
MultitaskClassifier soft fallback from `multitask_classifier.py:481-489`:
def _move_to_device(self) -> None:
device = self.config.device
if device >= 0:
if torch.cuda.is_available():
logging.info(f"Moving model to GPU (cuda:{device}).")
self.to(torch.device(f"cuda:{device}"))
else:
logging.info("No cuda device available. Switch to cpu instead.")
Python version requirement from `setup.py:49`:
python_requires=">=3.11",
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `ValueError: device=cuda but CUDA not available.` | LabelModel configured with `device="cuda"` but no GPU present | Use default `device="cpu"` or install CUDA toolkit |
| `No cuda device available. Switch to cpu instead.` | MultitaskClassifier configured with GPU but CUDA unavailable | This is a soft fallback (warning only); set `device=-1` to explicitly use CPU |
| `ModuleNotFoundError: No module named 'torch'` | PyTorch not installed | `pip install torch>=1.2.0` |
Compatibility Notes
- Device specification is inconsistent: LabelModel uses string-based device (`"cpu"`, `"cuda:0"`), while MultitaskClassifier uses integer-based (`-1`=CPU, `0`=GPU 0). LabelModel hard-fails on missing CUDA; MultitaskClassifier silently falls back to CPU.
- DataParallel ON by default: The MultitaskClassifier wraps all module pool entries in `nn.DataParallel` by default (`dataparallel: bool = True`). Set `dataparallel=False` on single-GPU systems to avoid overhead.
- No upper version bounds: All dependencies use `>=` constraints with no upper bounds, which could lead to breakage with future major versions of PyTorch or NumPy.
- Build system: Requires `setuptools >= 40.6.2` and `wheel >= 0.30.0` (from pyproject.toml).
Related Pages
- Implementation:Snorkel_team_Snorkel_LabelModel_Fit
- Implementation:Snorkel_team_Snorkel_LabelModel_Predict
- Implementation:Snorkel_team_Snorkel_SliceAwareClassifier_Init
- Implementation:Snorkel_team_Snorkel_Trainer_Fit
- Implementation:Snorkel_team_Snorkel_DictDataset_Init
- Implementation:Snorkel_team_Snorkel_MultitaskClassifier_Init
- Implementation:Snorkel_team_Snorkel_Trainer_Fit_Multitask
- Implementation:Snorkel_team_Snorkel_MultitaskClassifier_Score