Environment:Bigscience workshop Petals Python Transformers

Knowledge Sources	Petals HuggingFace Transformers
Domains	NLP, Infrastructure
Last Updated	2026-02-09 13:00 GMT

Overview

HuggingFace transformers and datasets environment for tokenization, data loading, and optimizer configuration in Petals workflows.

Description

This environment provides the HuggingFace ecosystem components used alongside Petals for tasks that do not directly require hivemind P2P networking. It includes the `transformers` library for tokenizer loading and model configuration, the `datasets` library for loading and preprocessing datasets, and PyTorch's optimizer and scheduler utilities. These components are used in the outer training/evaluation loop that wraps the distributed Petals core.

Usage

Required for tokenization workflows (AutoTokenizer), dataset loading pipelines (HuggingFace datasets), and optimizer/scheduler configuration (AdamW + linear warmup). Used alongside the Python_Hivemind environment for complete prompt tuning and inference workflows.

System Requirements

Category	Requirement	Notes
OS	Linux, macOS, Windows (WSL2)	Cross-platform support
Python	>= 3.8	Matches Petals requirement
Network	Internet access	For downloading tokenizers, datasets, and model configs from HuggingFace Hub

Dependencies

Python Packages

`transformers` == 4.43.1
`tokenizers` >= 0.13.3
`datasets` (user-installed for training workflows)
`torch` >= 1.12
`huggingface-hub` >= 0.11.1, < 1.0.0
`sentencepiece` >= 0.1.99
`safetensors` >= 0.3.1

Credentials

`HF_TOKEN` (optional): Required for accessing gated models or private datasets on HuggingFace Hub.

Quick Install

pip install petals datasets

Code Evidence

Transformers version pinning from `src/petals/__init__.py:23-26`:

if not os.getenv("PETALS_IGNORE_DEPENDENCY_VERSION"):
    assert (
        version.parse("4.43.1") <= version.parse(transformers.__version__) < version.parse("4.44.0")
    ), "Please install a proper transformers version: pip install transformers>=4.43.1,<4.44.0"

Tokenizer usage pattern from examples (AutoTokenizer):

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.padding_side = "right"  # For training; use "left" for generation

Common Errors

Error Message	Cause	Solution
`AssertionError: Please install a proper transformers version`	Wrong transformers version	`pip install transformers==4.43.1`
`ImportError: No module named 'datasets'`	datasets library not installed	`pip install datasets`
`OSError: Can't load tokenizer for 'model_name'`	Model requires authentication	Set `HF_TOKEN` environment variable

Compatibility Notes

transformers version: Petals strictly pins transformers to 4.43.1 (with < 4.44.0 ceiling). Using other versions will cause an assertion error unless `PETALS_IGNORE_DEPENDENCY_VERSION` is set.
datasets library: Not a direct Petals dependency but required by the prompt tuning examples. Install separately.
Padding side: Set `tokenizer.padding_side = "right"` for training and `"left"` for generation.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment