Environment:OpenGVLab InternVL PyTorch CUDA
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Deep_Learning, Computer_Vision |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
Linux environment with CUDA-enabled NVIDIA GPU, Python >= 3.8, PyTorch >= 2.0, and HuggingFace Transformers == 4.37.2 for training and inference of InternVL multimodal models.
Description
This environment provides the core runtime for all InternVL training, fine-tuning, and evaluation workflows. It is built on PyTorch with CUDA GPU acceleration and requires a strict version of HuggingFace Transformers (4.37.2) due to internal API compatibility. The stack includes torchvision for image processing, sentencepiece for tokenization, and timm for vision model utilities. All training scripts assume a distributed GPU cluster managed via SLURM or OpenMPI, with NCCL as the communication backend.
Usage
Use this environment for all InternVL workflows including supervised fine-tuning, LoRA fine-tuning, multi-stage pretraining, preference optimization (MPO/DPO), and benchmark evaluation. It is the mandatory prerequisite for every Implementation page in this wiki.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux (Ubuntu 20.04+ recommended) | SLURM or OpenMPI launcher support required |
| Hardware | NVIDIA GPU with CUDA support | Minimum 16GB VRAM; A100/H100 preferred for Flash Attention |
| GPU Compute | Compute capability >= 7.0 | SM 80+ (Ampere/Hopper) recommended for full Flash Attention support |
| Disk | 100GB+ SSD | High IOPS needed for dataset caching and checkpoints |
| Network | High-bandwidth interconnect | NCCL backend for multi-node distributed training |
Dependencies
System Packages
- `cuda-toolkit` >= 11.7 (CUDA 12.x recommended)
- `nccl` (NVIDIA Collective Communications Library)
- `git-lfs` (for downloading large model checkpoints)
Python Packages
- `torch` >= 2.0
- `torchvision` >= 0.15
- `transformers` == 4.37.2
- `tokenizers` == 0.15.1
- `sentencepiece` == 0.1.99
- `accelerate` (latest)
- `bitsandbytes` == 0.41.0
- `einops` (latest)
- `einops-exts` (latest)
- `timm` == 0.9.12
- `numpy` (latest)
- `scikit-learn` >= 1.2.2
- `packaging` (for version comparison)
- `Pillow` (PIL, for image loading)
Credentials
The following environment variables must be configured for distributed training:
- `RANK`: Global process rank (set by SLURM/launcher)
- `LOCAL_RANK`: Local GPU rank on the node
- `WORLD_SIZE`: Total number of processes
- `MASTER_ADDR`: Address of the rank-0 node
- `MASTER_PORT`: Port for distributed communication
- `LAUNCHER`: Set to `slurm` or `torchrun` (default: `slurm`)
- `TOKENIZERS_PARALLELISM`: Set to `true` (hardcoded in training scripts)
Optional:
- `SLURM_PROCID`, `SLURM_NTASKS`, `SLURM_NODELIST`: For SLURM-managed clusters
- `OMPI_COMM_WORLD_LOCAL_RANK`: For OpenMPI-managed clusters
Quick Install
# Install core dependencies
pip install torch>=2.0 torchvision>=0.15 \
transformers==4.37.2 tokenizers==0.15.1 sentencepiece==0.1.99 \
accelerate peft>=0.4.0 bitsandbytes==0.41.0 \
deepspeed==0.13.5 einops einops-exts timm==0.9.12 \
numpy scikit-learn>=1.2.2 packaging shortuuid
# Or install from the project directly
cd internvl_chat && pip install -e .
Code Evidence
Version validation from `modeling_internvl_chat.py:31-36,51`:
def version_cmp(v1, v2, op='eq'):
import operator
from packaging import version
op_func = getattr(operator, op)
return op_func(version.parse(v1), version.parse(v2))
# Line 51:
assert version_cmp(transformers.__version__, '4.37.0', 'ge')
CUDA device detection from `model/__init__.py:16`:
world_size = torch.cuda.device_count()
Distributed initialization from `dist_utils.py:48-51`:
num_gpus = torch.cuda.device_count()
torch.cuda.set_device(rank % num_gpus)
deepspeed.init_distributed(dist_backend=backend)
SLURM environment parsing from `dist_utils.py:78-104`:
proc_id = int(os.environ['SLURM_PROCID'])
ntasks = int(os.environ['SLURM_NTASKS'])
node_list = os.environ['SLURM_NODELIST']
Dependencies declared in `pyproject.toml:15-23`:
dependencies = [
"torch>=2", "torchvision>=0.15",
"transformers==4.37.2", "tokenizers==0.15.1", "sentencepiece==0.1.99",
"accelerate", "peft>=0.4.0", "bitsandbytes==0.41.0",
"deepspeed==0.13.5", "einops", "einops-exts", "timm==0.9.12",
]
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `AssertionError` at `version_cmp(transformers.__version__, '4.37.0', 'ge')` | Transformers version < 4.37.0 | `pip install transformers==4.37.2` |
| `RuntimeError: NCCL error` | Network misconfiguration in multi-node setup | Verify `MASTER_ADDR`, `MASTER_PORT`, and firewall rules |
| `CUDA out of memory` | Insufficient GPU VRAM for model size | Use DeepSpeed ZeRO Stage 3, reduce batch size, or enable gradient checkpointing |
| `ModuleNotFoundError: No module named 'packaging'` | Missing `packaging` library for version checks | `pip install packaging` |
Compatibility Notes
- Transformers Version: Strictly pinned to 4.37.2. A known TODO in `configuration_internvl_chat.py:48` warns there may be bugs in transformers 4.44+.
- SLURM vs Torchrun: Scripts default to SLURM launcher. Set `LAUNCHER=torchrun` for non-SLURM clusters.
- Petrel Client: Optional cloud storage client (Ceph). Falls back to PIL for local image loading if not installed.
- PIL Settings: Training scripts set `Image.MAX_IMAGE_PIXELS = None` and `ImageFile.LOAD_TRUNCATED_IMAGES = True` to handle large/corrupted images.
Related Pages
- Implementation:OpenGVLab_InternVL_Build_Transform
- Implementation:OpenGVLab_InternVL_Concat_Pad_Data_Collator
- Implementation:OpenGVLab_InternVL_Correctness_Build_Data
- Implementation:OpenGVLab_InternVL_DPO_Concat_Pad_Data_Collator
- Implementation:OpenGVLab_InternVL_Dynamic_Preprocess
- Implementation:OpenGVLab_InternVL_Evaluate_Chat_Model
- Implementation:OpenGVLab_InternVL_Evaluate_Sh
- Implementation:OpenGVLab_InternVL_InternVLChatModel_From_Pretrained
- Implementation:OpenGVLab_InternVL_InternVisionModel_From_Pretrained
- Implementation:OpenGVLab_InternVL_Internvl_Custom2hf
- Implementation:OpenGVLab_InternVL_LazySupervisedDataset
- Implementation:OpenGVLab_InternVL_Load_Model_And_Tokenizer
- Implementation:OpenGVLab_InternVL_Merge_LoRA
- Implementation:OpenGVLab_InternVL_ModelArguments_DataTrainingArguments
- Implementation:OpenGVLab_InternVL_MultimodalDPOTrainer
- Implementation:OpenGVLab_InternVL_PackedDataset
- Implementation:OpenGVLab_InternVL_Pretrain_Main
- Implementation:OpenGVLab_InternVL_TextVQAAccuracyEvaluator
- Implementation:OpenGVLab_InternVL_Trainer_Save_Model
- Implementation:OpenGVLab_InternVL_Trainer_Train
- Implementation:OpenGVLab_InternVL_Wrap_Backbone_LoRA
- Implementation:OpenGVLab_InternVL_Wrap_LLM_LoRA