Implementation:OpenGVLab InternVL InternLM2Config
| Knowledge Sources | |
|---|---|
| Domains | Model Configuration, Language Model, InternLM2 |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
Defines the InternLM2Config configuration class that stores all architectural hyperparameters for the InternLM2 language model used as a backbone in InternVL.
Description
InternLM2Config extends HuggingFace's PretrainedConfig with InternLM2-specific parameters:
- vocab_size (default 103168) -- Vocabulary size for the InternLM2 tokenizer.
- hidden_size (default 4096) -- Dimension of the hidden representations.
- intermediate_size (default 11008) -- Dimension of the MLP representations.
- num_hidden_layers (default 32) -- Number of transformer decoder layers.
- num_attention_heads (default 32) -- Number of attention heads, with support for Grouped Query Attention (GQA) via num_key_value_heads.
- hidden_act (default "silu") -- SiLU activation function.
- rope_theta (default 10000) -- Base period for Rotary Position Embeddings.
- rope_scaling -- Optional dictionary with type ("linear" or "dynamic") and factor for NTK-aware RoPE scaling, validated by _rope_scaling_validation.
- attn_implementation (default "eager") -- Attention backend selection (eager vs flash_attention_2).
- bias (default True) -- Whether to use bias in linear layers.
The class sets model_type to "internlm2" and _auto_class to "AutoConfig" for HuggingFace auto-class integration. If num_key_value_heads is not specified, it defaults to num_attention_heads (standard MHA).
Usage
Use this configuration class when instantiating InternLM2 models within InternVL. It is loaded automatically via AutoConfig.from_pretrained() for pretrained InternLM2-based InternVL models.
Code Reference
Source Location
- Repository: OpenGVLab_InternVL
- File: internvl_chat/internvl/model/internlm2/configuration_internlm2.py
- Lines: 1-150
Signature
class InternLM2Config(PretrainedConfig):
model_type = 'internlm2'
_auto_class = 'AutoConfig'
def __init__(self, vocab_size=103168, hidden_size=4096,
intermediate_size=11008, num_hidden_layers=32,
num_attention_heads=32, num_key_value_heads=None,
hidden_act='silu', max_position_embeddings=2048,
initializer_range=0.02, rms_norm_eps=1e-6,
use_cache=True, rope_theta=10000,
rope_scaling=None, attn_implementation='eager',
**kwargs): ...
Import
from internvl.model.internlm2.configuration_internlm2 import InternLM2Config
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| vocab_size | int | No | Vocabulary size (default: 103168) |
| hidden_size | int | No | Hidden dimension (default: 4096) |
| num_hidden_layers | int | No | Number of transformer layers (default: 32) |
| num_attention_heads | int | No | Number of attention heads (default: 32) |
| num_key_value_heads | int | No | Number of KV heads for GQA (default: same as num_attention_heads) |
| rope_scaling | dict | No | RoPE scaling config with type and factor fields |
| attn_implementation | str | No | Attention implementation: "eager" or "flash_attention_2" |
Outputs
| Name | Type | Description |
|---|---|---|
| config | InternLM2Config | Configuration object for InternLM2 model instantiation |
Usage Examples
Basic Usage
from internvl.model.internlm2.configuration_internlm2 import InternLM2Config
# Create a default config
config = InternLM2Config()
# Create with GQA (8 KV heads for 32 attention heads)
config = InternLM2Config(num_key_value_heads=8)
# Create with dynamic RoPE scaling
config = InternLM2Config(
rope_scaling={"type": "dynamic", "factor": 2.0}
)