Environment:Bigscience workshop Petals Python Hivemind
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Distributed_Computing |
| Last Updated | 2026-02-09 13:00 GMT |
Overview
Python >= 3.8 client environment with hivemind, PyTorch, and petals for distributed inference and training.
Description
This environment provides the core Python runtime required for all client-side interactions with the Petals distributed network. It combines several key components: hivemind supplies the peer-to-peer networking layer that connects clients to remote servers hosting model shards; PyTorch handles local tensor computation, autograd, and GPU acceleration when available; transformers manages model configuration loading, tokenizer instantiation, and generation utilities; and petals itself orchestrates distributed inference by routing forward and backward passes through sequences of remote transformer blocks. Together these packages form the minimum viable environment needed to connect to a Petals swarm, load a distributed large language model, run inference sessions, perform autoregressive text generation, and execute prompt tuning or other fine-tuning workflows across the decentralized network.
Usage
This environment is required for all client-side operations within the Petals ecosystem. This includes loading distributed model classes such as AutoDistributedModelForCausalLM and architecture-specific variants like DistributedLlamaForCausalLM or DistributedBloomForCausalLM; tokenizing inputs using HuggingFace tokenizers; opening inference sessions via InferenceSession for stepped generation; running full autoregressive generation through RemoteGenerationMixin.generate(); performing prompt tuning with PTuneMixin; executing distributed autograd through RemoteSequentialAutogradFunction for training and fine-tuning; and running distributed evaluation loops. Any script or application that communicates with the Petals swarm must have this environment installed.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux, macOS, Windows (WSL2) | macOS requires special fork safety env vars |
| Hardware | Any CPU (GPU optional for client) | Client runs on CPU; servers need GPU |
| Network | Internet access | Required for P2P swarm connectivity |
| Python | >= 3.8 | Supports 3.8, 3.9, 3.10, 3.11 |
Dependencies
System Packages
- No special system packages required for client-only usage
Python Packages
petals(installs all below)torch>= 1.12hivemind@ git+https://github.com/learning-at-home/hivemind.git@213bff98a62accb91f254e2afdccbf1d69ebdea9transformers== 4.43.1huggingface-hub>= 0.11.1, < 1.0.0tokenizers>= 0.13.3accelerate>= 0.27.2bitsandbytes== 0.41.1tensor_parallel== 1.0.23peft== 0.8.2safetensors>= 0.3.1sentencepiece>= 0.1.99packaging>= 20.9async-timeout>= 4.0.2humanfriendlyDijkstar>= 2.6.0numpy< 2
Credentials
HF_TOKEN(optional): HuggingFace API token for accessing gated models (e.g., Llama-2). Required when loadingmeta-llama/Llama-2-*models.
Quick Install
pip install petals
Code Evidence
Transformers version pinning from src/petals/__init__.py:23-26:
if not os.getenv("PETALS_IGNORE_DEPENDENCY_VERSION"):
assert (
version.parse("4.43.1") <= version.parse(transformers.__version__) < version.parse("4.44.0")
), "Please install a proper transformers version: pip install transformers>=4.43.1,<4.44.0"
macOS fork safety from src/petals/__init__.py:6-9:
if platform.system() == "Darwin":
os.environ.setdefault("no_proxy", "*")
os.environ.setdefault("OBJC_DISABLE_INITIALIZE_FORK_SAFETY", "YES")
HuggingFace auth check from src/petals/utils/hf_auth.py:5-7:
def always_needs_auth(model_name: Union[str, os.PathLike, None]) -> bool:
loading_from_repo = model_name is not None and not os.path.isdir(model_name)
return loading_from_repo and model_name.startswith("meta-llama/Llama-2-")
Python version and dependencies from setup.cfg:33-52:
python_requires = >=3.8
install_requires =
torch>=1.12
bitsandbytes==0.41.1
accelerate>=0.27.2
huggingface-hub>=0.11.1,<1.0.0
tokenizers>=0.13.3
transformers==4.43.1
hivemind @ git+https://github.com/learning-at-home/hivemind.git@...
tensor_parallel==1.0.23
peft==0.8.2
safetensors>=0.3.1
sentencepiece>=0.1.99
packaging>=20.9
async-timeout>=4.0.2
numpy<2
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
AssertionError: Please install a proper transformers version |
Transformers version outside 4.43.1-4.43.x range | pip install transformers==4.43.1
|
OSError: meta-llama/Llama-2-* requires authentication |
Gated model needs HF token | Set HF_TOKEN env var or pass token=True
|
ImportError: hivemind not found |
hivemind not installed from correct commit | pip install petals (installs pinned hivemind)
|
Compatibility Notes
- macOS: Automatically sets
OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YESandno_proxy=*to prevent fork-related crashes. - PETALS_IGNORE_DEPENDENCY_VERSION: Set this env var to bypass the strict transformers version check.
- USE_LEGACY_BFLOAT16: Controls bfloat16 serialization mode in hivemind; Petals defaults to modern mode.
- bitsandbytes: The
BITSANDBYTES_NOWELCOMEenv var is set automatically to suppress the welcome message.
Related Pages
- Implementation:Bigscience_workshop_Petals_AutoDistributedModelForCausalLM_From_Pretrained
- Implementation:Bigscience_workshop_Petals_InferenceSession
- Implementation:Bigscience_workshop_Petals_RemoteGenerationMixin_Generate
- Implementation:Bigscience_workshop_Petals_PTuneMixin
- Implementation:Bigscience_workshop_Petals_RemoteSequentialAutogradFunction
- Implementation:Bigscience_workshop_Petals_DistributedLlamaForSequenceClassification_From_Pretrained
- Implementation:Bigscience_workshop_Petals_DistributedBloomForCausalLM_From_Pretrained
- Implementation:Bigscience_workshop_Petals_Distributed_Evaluation_Loop
- Implementation:Bigscience_workshop_Petals_RemoteGenerationMixin_Generate_With_Session