Environment:OpenGVLab InternVL PyTorch CUDA

Knowledge Sources	OpenGVLab/InternVL PyTorch
Domains	Infrastructure, Deep_Learning, Computer_Vision
Last Updated	2026-02-07 14:00 GMT

Overview

Linux environment with CUDA-enabled NVIDIA GPU, Python >= 3.8, PyTorch >= 2.0, and HuggingFace Transformers == 4.37.2 for training and inference of InternVL multimodal models.

Description

This environment provides the core runtime for all InternVL training, fine-tuning, and evaluation workflows. It is built on PyTorch with CUDA GPU acceleration and requires a strict version of HuggingFace Transformers (4.37.2) due to internal API compatibility. The stack includes torchvision for image processing, sentencepiece for tokenization, and timm for vision model utilities. All training scripts assume a distributed GPU cluster managed via SLURM or OpenMPI, with NCCL as the communication backend.

Usage

Use this environment for all InternVL workflows including supervised fine-tuning, LoRA fine-tuning, multi-stage pretraining, preference optimization (MPO/DPO), and benchmark evaluation. It is the mandatory prerequisite for every Implementation page in this wiki.

System Requirements

Category	Requirement	Notes
OS	Linux (Ubuntu 20.04+ recommended)	SLURM or OpenMPI launcher support required
Hardware	NVIDIA GPU with CUDA support	Minimum 16GB VRAM; A100/H100 preferred for Flash Attention
GPU Compute	Compute capability >= 7.0	SM 80+ (Ampere/Hopper) recommended for full Flash Attention support
Disk	100GB+ SSD	High IOPS needed for dataset caching and checkpoints
Network	High-bandwidth interconnect	NCCL backend for multi-node distributed training

Dependencies

System Packages

`cuda-toolkit` >= 11.7 (CUDA 12.x recommended)
`nccl` (NVIDIA Collective Communications Library)
`git-lfs` (for downloading large model checkpoints)

Python Packages

`torch` >= 2.0
`torchvision` >= 0.15
`transformers` == 4.37.2
`tokenizers` == 0.15.1
`sentencepiece` == 0.1.99
`accelerate` (latest)
`bitsandbytes` == 0.41.0
`einops` (latest)
`einops-exts` (latest)
`timm` == 0.9.12
`numpy` (latest)
`scikit-learn` >= 1.2.2
`packaging` (for version comparison)
`Pillow` (PIL, for image loading)

Credentials

The following environment variables must be configured for distributed training:

`RANK`: Global process rank (set by SLURM/launcher)
`LOCAL_RANK`: Local GPU rank on the node
`WORLD_SIZE`: Total number of processes
`MASTER_ADDR`: Address of the rank-0 node
`MASTER_PORT`: Port for distributed communication
`LAUNCHER`: Set to `slurm` or `torchrun` (default: `slurm`)
`TOKENIZERS_PARALLELISM`: Set to `true` (hardcoded in training scripts)

Optional:

`SLURM_PROCID`, `SLURM_NTASKS`, `SLURM_NODELIST`: For SLURM-managed clusters
`OMPI_COMM_WORLD_LOCAL_RANK`: For OpenMPI-managed clusters

Quick Install

# Install core dependencies
pip install torch>=2.0 torchvision>=0.15 \
    transformers==4.37.2 tokenizers==0.15.1 sentencepiece==0.1.99 \
    accelerate peft>=0.4.0 bitsandbytes==0.41.0 \
    deepspeed==0.13.5 einops einops-exts timm==0.9.12 \
    numpy scikit-learn>=1.2.2 packaging shortuuid

# Or install from the project directly
cd internvl_chat && pip install -e .

Code Evidence

Version validation from `modeling_internvl_chat.py:31-36,51`:

def version_cmp(v1, v2, op='eq'):
    import operator
    from packaging import version
    op_func = getattr(operator, op)
    return op_func(version.parse(v1), version.parse(v2))

# Line 51:
assert version_cmp(transformers.__version__, '4.37.0', 'ge')

CUDA device detection from `model/__init__.py:16`:

world_size = torch.cuda.device_count()

Distributed initialization from `dist_utils.py:48-51`:

num_gpus = torch.cuda.device_count()
torch.cuda.set_device(rank % num_gpus)
deepspeed.init_distributed(dist_backend=backend)

SLURM environment parsing from `dist_utils.py:78-104`:

proc_id = int(os.environ['SLURM_PROCID'])
ntasks = int(os.environ['SLURM_NTASKS'])
node_list = os.environ['SLURM_NODELIST']

Dependencies declared in `pyproject.toml:15-23`:

dependencies = [
    "torch>=2", "torchvision>=0.15",
    "transformers==4.37.2", "tokenizers==0.15.1", "sentencepiece==0.1.99",
    "accelerate", "peft>=0.4.0", "bitsandbytes==0.41.0",
    "deepspeed==0.13.5", "einops", "einops-exts", "timm==0.9.12",
]

Common Errors

Error Message	Cause	Solution
`AssertionError` at `version_cmp(transformers.__version__, '4.37.0', 'ge')`	Transformers version < 4.37.0	`pip install transformers==4.37.2`
`RuntimeError: NCCL error`	Network misconfiguration in multi-node setup	Verify `MASTER_ADDR`, `MASTER_PORT`, and firewall rules
`CUDA out of memory`	Insufficient GPU VRAM for model size	Use DeepSpeed ZeRO Stage 3, reduce batch size, or enable gradient checkpointing
`ModuleNotFoundError: No module named 'packaging'`	Missing `packaging` library for version checks	`pip install packaging`

Compatibility Notes

Transformers Version: Strictly pinned to 4.37.2. A known TODO in `configuration_internvl_chat.py:48` warns there may be bugs in transformers 4.44+.
SLURM vs Torchrun: Scripts default to SLURM launcher. Set `LAUNCHER=torchrun` for non-SLURM clusters.
Petrel Client: Optional cloud storage client (Ceph). Falls back to PIL for local image loading if not installed.
PIL Settings: Training scripts set `Image.MAX_IMAGE_PIXELS = None` and `ImageFile.LOAD_TRUNCATED_IMAGES = True` to handle large/corrupted images.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment