Implementation:Bigscience workshop Petals DistributedMixtralConfig
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Computing, NLP, Model_Configuration, Mixture_of_Experts |
| Last Updated | 2026-02-09 14:00 GMT |
Overview
Concrete tool for configuring Mixtral Mixture-of-Experts models for distributed inference and fine-tuning in the Petals network.
Description
DistributedMixtralConfig is a configuration class that bridges HuggingFace's MixtralConfig with Petals' distributed infrastructure. It inherits from MixtralConfig, ClientConfig, PTuneConfig, and LMHeadConfig, combining standard Mixtral model configuration with distributed client settings, prompt tuning parameters, and language model head configuration.
The class sets Mixtral-specific attributes: WrappedMixtralBlock as the block class, MixtralAttention as the attention class, "model.layers" as the block prefix, and num_key_value_groups fixed to 1. Its from_pretrained override derives DHT prefixes from the model name (replacing dots with hyphens) and ensures pad_token_id defaults to 0.
Usage
Import this class when you need to load a Mixtral model (such as Mixtral-8x7B or Mixtral-8x22B) for distributed inference through Petals. It is used internally by AutoDistributedModelForCausalLM and should not typically be instantiated directly unless building custom model loading pipelines.
Code Reference
Source Location
- Repository: Bigscience_workshop_Petals
- File: src/petals/models/mixtral/config.py
- Lines: 1-36
Signature
class DistributedMixtralConfig(MixtralConfig, ClientConfig, PTuneConfig, LMHeadConfig):
block_class = WrappedMixtralBlock
attn_class = MixtralAttention
block_prefix = "model.layers"
num_key_value_groups = 1
@classmethod
def from_pretrained(
cls,
model_name_or_path: Union[str, os.PathLike, None],
*args,
dht_prefix: Optional[str] = None,
**kwargs,
):
"""Load config and derive DHT prefix from model name."""
Import
from petals.models.mixtral.config import DistributedMixtralConfig
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_name_or_path | Union[str, os.PathLike, None] | Yes | HuggingFace model ID or local path (e.g., "mistralai/Mixtral-8x7B-v0.1") |
| dht_prefix | Optional[str] | No | Custom DHT prefix for peer discovery; auto-derived from model name if not provided |
| *args, **kwargs | Any | No | Passed through to MixtralConfig.from_pretrained |
Outputs
| Name | Type | Description |
|---|---|---|
| config | DistributedMixtralConfig | Mixtral config with distributed settings, block/attention class references, and MoE routing parameters |
Usage Examples
Loading Mixtral Config for Distributed Inference
from petals.models.mixtral.config import DistributedMixtralConfig
# Load Mixtral-8x7B config for distributed use
config = DistributedMixtralConfig.from_pretrained("mistralai/Mixtral-8x7B-v0.1")
# Config now has distributed and MoE attributes
print(config.block_class) # WrappedMixtralBlock
print(config.block_prefix) # "model.layers"
print(config.num_key_value_groups) # 1
print(config.num_local_experts) # 8 (from base MixtralConfig)
print(config.num_experts_per_tok) # 2 (from base MixtralConfig)