Environment:Bigscience workshop Petals Python Transformers
| Knowledge Sources | |
|---|---|
| Domains | NLP, Infrastructure |
| Last Updated | 2026-02-09 13:00 GMT |
Overview
HuggingFace transformers and datasets environment for tokenization, data loading, and optimizer configuration in Petals workflows.
Description
This environment provides the HuggingFace ecosystem components used alongside Petals for tasks that do not directly require hivemind P2P networking. It includes the `transformers` library for tokenizer loading and model configuration, the `datasets` library for loading and preprocessing datasets, and PyTorch's optimizer and scheduler utilities. These components are used in the outer training/evaluation loop that wraps the distributed Petals core.
Usage
Required for tokenization workflows (AutoTokenizer), dataset loading pipelines (HuggingFace datasets), and optimizer/scheduler configuration (AdamW + linear warmup). Used alongside the Python_Hivemind environment for complete prompt tuning and inference workflows.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux, macOS, Windows (WSL2) | Cross-platform support |
| Python | >= 3.8 | Matches Petals requirement |
| Network | Internet access | For downloading tokenizers, datasets, and model configs from HuggingFace Hub |
Dependencies
Python Packages
- `transformers` == 4.43.1
- `tokenizers` >= 0.13.3
- `datasets` (user-installed for training workflows)
- `torch` >= 1.12
- `huggingface-hub` >= 0.11.1, < 1.0.0
- `sentencepiece` >= 0.1.99
- `safetensors` >= 0.3.1
Credentials
- `HF_TOKEN` (optional): Required for accessing gated models or private datasets on HuggingFace Hub.
Quick Install
pip install petals datasets
Code Evidence
Transformers version pinning from `src/petals/__init__.py:23-26`:
if not os.getenv("PETALS_IGNORE_DEPENDENCY_VERSION"):
assert (
version.parse("4.43.1") <= version.parse(transformers.__version__) < version.parse("4.44.0")
), "Please install a proper transformers version: pip install transformers>=4.43.1,<4.44.0"
Tokenizer usage pattern from examples (AutoTokenizer):
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.padding_side = "right" # For training; use "left" for generation
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `AssertionError: Please install a proper transformers version` | Wrong transformers version | `pip install transformers==4.43.1` |
| `ImportError: No module named 'datasets'` | datasets library not installed | `pip install datasets` |
| `OSError: Can't load tokenizer for 'model_name'` | Model requires authentication | Set `HF_TOKEN` environment variable |
Compatibility Notes
- transformers version: Petals strictly pins transformers to 4.43.1 (with < 4.44.0 ceiling). Using other versions will cause an assertion error unless `PETALS_IGNORE_DEPENDENCY_VERSION` is set.
- datasets library: Not a direct Petals dependency but required by the prompt tuning examples. Install separately.
- Padding side: Set `tokenizer.padding_side = "right"` for training and `"left"` for generation.