Principle:Bigscience workshop Petals Server CLI Launch
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Computing, Infrastructure, CLI |
| Last Updated | 2026-02-09 14:00 GMT |
Overview
The command-line interface entry point for launching a Petals server that contributes GPU resources to the distributed model-serving swarm.
Description
Server CLI Launch provides the primary mechanism for volunteers to contribute GPU resources to the Petals network. The CLI accepts extensive configuration options covering model selection, GPU resource allocation, network connectivity, quantization, and performance tuning.
The launch process:
- Parses command-line arguments via configargparse (supports config files too)
- Validates GPU availability and CUDA setup
- Creates a Server instance with all configuration
- Calls server.run() to start the main loop
Key design decisions:
- Sensible defaults: Most parameters have reasonable defaults (auto throughput estimation, auto block count, auto dtype)
- Docker support: Can be run directly or via the official Docker image
- Private swarms: The --new_swarm flag creates an isolated network instead of joining the public swarm
- Multi-GPU: --tensor_parallel_devices enables serving across multiple GPUs
Usage
Use this principle when contributing GPU resources to the Petals network. This is the primary entry point for server operators. The CLI can be invoked directly with python -m petals.cli.run_server or via Docker.
Theoretical Basis
Volunteer computing model:
Petals uses a BitTorrent-inspired architecture where:
# Abstract server launch flow
args = parse_cli_arguments()
server = Server(
model=args.model,
num_blocks=args.num_blocks or auto_detect(),
throughput=args.throughput or benchmark(),
initial_peers=args.initial_peers or PUBLIC_PEERS,
)
server.run() # Blocks until shutdown
The server automatically determines which transformer blocks to host based on the current network state, benchmarks its throughput, and announces availability via the DHT.