Implementation:OpenGVLab InternVL MPTConfig
| Knowledge Sources | |
|---|---|
| Domains | Language Models, Configuration, LLaVA |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
HuggingFace-style configuration class for the MPT (MosaicML Pretrained Transformer) language model, defining model architecture and initialization parameters.
Description
MPTConfig extends HuggingFace's PretrainedConfig with MPT-specific parameters. The configuration includes core architectural settings: d_model (embedding dimension, default 2048), n_heads (attention heads, default 16), n_layers (transformer layers, default 24), expansion_ratio (MLP expansion, default 4), and max_seq_len (default 2048). Attention configuration is managed through a nested attn_config dictionary supporting multihead_attention or multiquery_attention types, with backend options of torch, flash, or triton, and optional ALiBi positional bias and prefix LM modes. Initialization is configured through a separate init_config dictionary supporting multiple schemes (kaiming, xavier, neox, etc.). The _validate_config() method enforces constraints such as d_model divisibility by n_heads, valid probability ranges for dropout, valid attention implementations, and proper positional encoding configuration (either learned or ALiBi).
Usage
Use this configuration class when instantiating an MPT-based model for the LLaVA multimodal pipeline, or when customizing MPT architecture parameters.
Code Reference
Source Location
- Repository: OpenGVLab_InternVL
- File: internvl_chat_llava/llava/model/language_model/mpt/configuration_mpt.py
- Lines: 1-118
Signature
class MPTConfig(PretrainedConfig):
model_type = 'mpt'
def __init__(self, d_model=2048, n_heads=16, n_layers=24, expansion_ratio=4,
max_seq_len=2048, vocab_size=50368, resid_pdrop=0.0, emb_pdrop=0.0,
learned_pos_emb=True, attn_config=attn_config_defaults,
init_device='cpu', logit_scale=None, no_bias=False, verbose=0,
embedding_fraction=1.0, norm_type='low_precision_layernorm',
use_cache=False, init_config=init_config_defaults, **kwargs): ...
Import
from llava.model.language_model.mpt.configuration_mpt import MPTConfig
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| d_model | int | No | Embedding dimension size (default: 2048) |
| n_heads | int | No | Number of attention heads (default: 16) |
| n_layers | int | No | Number of transformer layers (default: 24) |
| expansion_ratio | int | No | MLP expansion ratio (default: 4) |
| max_seq_len | int | No | Maximum sequence length (default: 2048) |
| attn_config | Dict | No | Attention configuration dict (type, impl, ALiBi, etc.) |
| init_config | Dict | No | Weight initialization configuration dict |
Outputs
| Name | Type | Description |
|---|---|---|
| config | MPTConfig | Validated MPT configuration instance |
Usage Examples
Basic Usage
from llava.model.language_model.mpt.configuration_mpt import MPTConfig
# Create config with custom settings
config = MPTConfig(
d_model=4096,
n_heads=32,
n_layers=32,
attn_config={'attn_type': 'multihead_attention', 'attn_impl': 'flash'}
)