Implementation:Bigscience workshop Petals DistributedFalconConfig
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Computing, NLP, Model_Configuration |
| Last Updated | 2026-02-09 14:00 GMT |
Overview
Concrete tool for configuring Falcon models for distributed inference and fine-tuning in the Petals network.
Description
DistributedFalconConfig is a configuration class that bridges HuggingFace's FalconConfig with Petals' distributed infrastructure. It inherits from five parent classes: DefaultRevisionMixin, FalconConfig, ClientConfig, PTuneConfig, and LMHeadConfig, combining the standard Falcon model configuration with distributed client settings, prompt tuning parameters, and language model head configuration.
The class sets Falcon-specific attributes (WrappedFalconBlock as the block class, FalconAttention as the attention class, and "transformer.h" as the block prefix) and dynamically computes num_key_value_groups based on the decoder architecture variant (new decoder architecture, multi-query, or standard attention). Its from_pretrained override derives DHT prefixes from the model repository name and ensures pad_token_id defaults to 0.
Usage
Import this class when you need to load a Falcon model (such as Falcon-7B, Falcon-40B, or Falcon-180B) for distributed inference through Petals. It is used internally by AutoDistributedModelForCausalLM and should not typically be instantiated directly unless building custom model loading pipelines.
Code Reference
Source Location
- Repository: Bigscience_workshop_Petals
- File: src/petals/models/falcon/config.py
- Lines: 1-48
Signature
class DistributedFalconConfig(DefaultRevisionMixin, FalconConfig, ClientConfig, PTuneConfig, LMHeadConfig):
block_class = WrappedFalconBlock
attn_class = FalconAttention
block_prefix = "transformer.h"
@property
def num_key_value_groups(self) -> int:
"""
Returns the number of key-value groups based on architecture variant:
- new_decoder_architecture: num_attention_heads // num_kv_heads
- multi_query: num_attention_heads
- standard: 1
"""
@classmethod
def from_pretrained(
cls,
model_name_or_path: Union[str, os.PathLike, None],
*args,
dht_prefix: Optional[str] = None,
**kwargs,
):
"""Load config and derive DHT prefix from model name."""
Import
from petals.models.falcon.config import DistributedFalconConfig
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_name_or_path | Union[str, os.PathLike, None] | Yes | HuggingFace model ID or local path (e.g., "tiiuae/falcon-7b") |
| dht_prefix | Optional[str] | No | Custom DHT prefix for peer discovery; auto-derived from model name if not provided |
| *args, **kwargs | Any | No | Passed through to FalconConfig.from_pretrained |
Outputs
| Name | Type | Description |
|---|---|---|
| config | DistributedFalconConfig | Falcon config with distributed settings, block/attention class references, and computed num_key_value_groups |
Usage Examples
Loading Falcon Config for Distributed Inference
from petals.models.falcon.config import DistributedFalconConfig
# Load Falcon-180B config for distributed use
config = DistributedFalconConfig.from_pretrained("tiiuae/falcon-180B")
# Config now has distributed attributes set
print(config.block_class) # WrappedFalconBlock
print(config.block_prefix) # "transformer.h"
print(config.num_key_value_groups) # Depends on architecture variant
print(config.dht_prefix) # "falcon-180B"
Checking Attention Architecture
from petals.models.falcon.config import DistributedFalconConfig
config = DistributedFalconConfig.from_pretrained("tiiuae/falcon-7b")
# num_key_value_groups adapts to architecture
# Falcon-7B uses multi-query attention: returns num_attention_heads
# Falcon-40B uses grouped-query attention: returns num_attention_heads // num_kv_heads
print(f"KV groups: {config.num_key_value_groups}")