Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Bigscience workshop Petals Server Run

From Leeroopedia


Knowledge Sources
Domains Distributed_Computing, Infrastructure, Monitoring
Last Updated 2026-02-09 14:00 GMT

Overview

Concrete tool for running the Petals server main loop with health monitoring and automatic rebalancing, provided by the Petals server module.

Description

Server.run() is the main event loop that:

  1. Creates and starts the initial ModuleContainer via ModuleContainer.create()
  2. Enters an infinite loop checking container health and swarm balance
  3. On rebalancing trigger: shuts down the current container, selects new blocks, creates a new container
  4. On KeyboardInterrupt: calls Server.shutdown() for graceful cleanup

The loop also calls Server._should_choose_other_blocks() which wraps should_choose_other_blocks() with the randomized check interval and delay.

Usage

Called by main() in the CLI after Server.__init__ completes. This method blocks until the server is shut down.

Code Reference

Source Location

  • Repository: petals
  • File: src/petals/server/server.py (L328-384, Server.run)
  • File: src/petals/server/server.py (L413-418, Server._should_choose_other_blocks)
  • File: src/petals/server/server.py (L420-428, Server.shutdown)

Signature

class Server:
    def run(self) -> None:
        """
        Main server loop: start serving, monitor health, rebalance as needed.

        Creates ModuleContainer, then loops:
        1. Check container health via is_healthy()
        2. Periodically check swarm balance via _should_choose_other_blocks()
        3. If unhealthy or imbalanced: shutdown, re-select blocks, restart

        Blocks until KeyboardInterrupt triggers graceful shutdown.
        """

    def _should_choose_other_blocks(self) -> bool:
        """
        Check if rebalancing is needed (randomized interval).
        Wraps should_choose_other_blocks with timing logic.
        """

    def shutdown(self, timeout: Optional[float] = 5) -> None:
        """
        Graceful server shutdown.

        Stops ModuleContainer, de-registers blocks from DHT,
        waits for in-flight requests to complete.
        """

Import

from petals.server.server import Server

server = Server(...)
server.run()  # Blocks until shutdown

I/O Contract

Inputs

Name Type Required Description
self Server Yes Fully configured Server instance from __init__

Outputs

Name Type Description
(blocking) None Method runs until KeyboardInterrupt or fatal error
DHT state dict Blocks de-registered (OFFLINE) on shutdown

Usage Examples

Full Server Lifecycle

from petals.server.server import Server
from petals.constants import PUBLIC_INITIAL_PEERS

server = Server(
    initial_peers=PUBLIC_INITIAL_PEERS,
    dht_prefix=None,
    converted_model_name_or_path="petals-team/StableBeluga2",
    throughput="auto",
)

try:
    server.run()
    # Server is now:
    # 1. Serving transformer blocks via RPC
    # 2. Announcing ONLINE status in DHT every update_period
    # 3. Checking swarm balance every ~120s
    # 4. Rebalancing if needed
except KeyboardInterrupt:
    pass  # server.run() handles graceful shutdown internally

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment