Implementation:Bigscience workshop Petals Server Run

Knowledge Sources	Petals
Domains	Distributed_Computing, Infrastructure, Monitoring
Last Updated	2026-02-09 14:00 GMT

Overview

Concrete tool for running the Petals server main loop with health monitoring and automatic rebalancing, provided by the Petals server module.

Description

Server.run() is the main event loop that:

Creates and starts the initial ModuleContainer via ModuleContainer.create()
Enters an infinite loop checking container health and swarm balance
On rebalancing trigger: shuts down the current container, selects new blocks, creates a new container
On KeyboardInterrupt: calls Server.shutdown() for graceful cleanup

The loop also calls Server._should_choose_other_blocks() which wraps should_choose_other_blocks() with the randomized check interval and delay.

Usage

Called by main() in the CLI after Server.__init__ completes. This method blocks until the server is shut down.

Code Reference

Source Location

Repository: petals
File: src/petals/server/server.py (L328-384, Server.run)
File: src/petals/server/server.py (L413-418, Server._should_choose_other_blocks)
File: src/petals/server/server.py (L420-428, Server.shutdown)

Signature

class Server:
    def run(self) -> None:
        """
        Main server loop: start serving, monitor health, rebalance as needed.

        Creates ModuleContainer, then loops:
        1. Check container health via is_healthy()
        2. Periodically check swarm balance via _should_choose_other_blocks()
        3. If unhealthy or imbalanced: shutdown, re-select blocks, restart

        Blocks until KeyboardInterrupt triggers graceful shutdown.
        """

    def _should_choose_other_blocks(self) -> bool:
        """
        Check if rebalancing is needed (randomized interval).
        Wraps should_choose_other_blocks with timing logic.
        """

    def shutdown(self, timeout: Optional[float] = 5) -> None:
        """
        Graceful server shutdown.

        Stops ModuleContainer, de-registers blocks from DHT,
        waits for in-flight requests to complete.
        """

Import

from petals.server.server import Server

server = Server(...)
server.run()  # Blocks until shutdown

I/O Contract

Inputs

Name	Type	Required	Description
self	Server	Yes	Fully configured Server instance from __init__

Outputs

Name	Type	Description
(blocking)	None	Method runs until KeyboardInterrupt or fatal error
DHT state	dict	Blocks de-registered (OFFLINE) on shutdown

Usage Examples

Full Server Lifecycle

from petals.server.server import Server
from petals.constants import PUBLIC_INITIAL_PEERS

server = Server(
    initial_peers=PUBLIC_INITIAL_PEERS,
    dht_prefix=None,
    converted_model_name_or_path="petals-team/StableBeluga2",
    throughput="auto",
)

try:
    server.run()
    # Server is now:
    # 1. Serving transformer blocks via RPC
    # 2. Announcing ONLINE status in DHT every update_period
    # 3. Checking swarm balance every ~120s
    # 4. Rebalancing if needed
except KeyboardInterrupt:
    pass  # server.run() handles graceful shutdown internally

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment