Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:OpenGVLab InternVL MPTConfig

From Leeroopedia


Knowledge Sources
Domains Language Models, Configuration, LLaVA
Last Updated 2026-02-07 14:00 GMT

Overview

HuggingFace-style configuration class for the MPT (MosaicML Pretrained Transformer) language model, defining model architecture and initialization parameters.

Description

MPTConfig extends HuggingFace's PretrainedConfig with MPT-specific parameters. The configuration includes core architectural settings: d_model (embedding dimension, default 2048), n_heads (attention heads, default 16), n_layers (transformer layers, default 24), expansion_ratio (MLP expansion, default 4), and max_seq_len (default 2048). Attention configuration is managed through a nested attn_config dictionary supporting multihead_attention or multiquery_attention types, with backend options of torch, flash, or triton, and optional ALiBi positional bias and prefix LM modes. Initialization is configured through a separate init_config dictionary supporting multiple schemes (kaiming, xavier, neox, etc.). The _validate_config() method enforces constraints such as d_model divisibility by n_heads, valid probability ranges for dropout, valid attention implementations, and proper positional encoding configuration (either learned or ALiBi).

Usage

Use this configuration class when instantiating an MPT-based model for the LLaVA multimodal pipeline, or when customizing MPT architecture parameters.

Code Reference

Source Location

Signature

class MPTConfig(PretrainedConfig):
    model_type = 'mpt'
    def __init__(self, d_model=2048, n_heads=16, n_layers=24, expansion_ratio=4,
                 max_seq_len=2048, vocab_size=50368, resid_pdrop=0.0, emb_pdrop=0.0,
                 learned_pos_emb=True, attn_config=attn_config_defaults,
                 init_device='cpu', logit_scale=None, no_bias=False, verbose=0,
                 embedding_fraction=1.0, norm_type='low_precision_layernorm',
                 use_cache=False, init_config=init_config_defaults, **kwargs): ...

Import

from llava.model.language_model.mpt.configuration_mpt import MPTConfig

I/O Contract

Inputs

Name Type Required Description
d_model int No Embedding dimension size (default: 2048)
n_heads int No Number of attention heads (default: 16)
n_layers int No Number of transformer layers (default: 24)
expansion_ratio int No MLP expansion ratio (default: 4)
max_seq_len int No Maximum sequence length (default: 2048)
attn_config Dict No Attention configuration dict (type, impl, ALiBi, etc.)
init_config Dict No Weight initialization configuration dict

Outputs

Name Type Description
config MPTConfig Validated MPT configuration instance

Usage Examples

Basic Usage

from llava.model.language_model.mpt.configuration_mpt import MPTConfig

# Create config with custom settings
config = MPTConfig(
    d_model=4096,
    n_heads=32,
    n_layers=32,
    attn_config={'attn_type': 'multihead_attention', 'attn_impl': 'flash'}
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment