Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:OpenGVLab InternVL PyTorch CUDA

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Deep_Learning, Computer_Vision
Last Updated 2026-02-07 14:00 GMT

Overview

Linux environment with CUDA-enabled NVIDIA GPU, Python >= 3.8, PyTorch >= 2.0, and HuggingFace Transformers == 4.37.2 for training and inference of InternVL multimodal models.

Description

This environment provides the core runtime for all InternVL training, fine-tuning, and evaluation workflows. It is built on PyTorch with CUDA GPU acceleration and requires a strict version of HuggingFace Transformers (4.37.2) due to internal API compatibility. The stack includes torchvision for image processing, sentencepiece for tokenization, and timm for vision model utilities. All training scripts assume a distributed GPU cluster managed via SLURM or OpenMPI, with NCCL as the communication backend.

Usage

Use this environment for all InternVL workflows including supervised fine-tuning, LoRA fine-tuning, multi-stage pretraining, preference optimization (MPO/DPO), and benchmark evaluation. It is the mandatory prerequisite for every Implementation page in this wiki.

System Requirements

Category Requirement Notes
OS Linux (Ubuntu 20.04+ recommended) SLURM or OpenMPI launcher support required
Hardware NVIDIA GPU with CUDA support Minimum 16GB VRAM; A100/H100 preferred for Flash Attention
GPU Compute Compute capability >= 7.0 SM 80+ (Ampere/Hopper) recommended for full Flash Attention support
Disk 100GB+ SSD High IOPS needed for dataset caching and checkpoints
Network High-bandwidth interconnect NCCL backend for multi-node distributed training

Dependencies

System Packages

  • `cuda-toolkit` >= 11.7 (CUDA 12.x recommended)
  • `nccl` (NVIDIA Collective Communications Library)
  • `git-lfs` (for downloading large model checkpoints)

Python Packages

  • `torch` >= 2.0
  • `torchvision` >= 0.15
  • `transformers` == 4.37.2
  • `tokenizers` == 0.15.1
  • `sentencepiece` == 0.1.99
  • `accelerate` (latest)
  • `bitsandbytes` == 0.41.0
  • `einops` (latest)
  • `einops-exts` (latest)
  • `timm` == 0.9.12
  • `numpy` (latest)
  • `scikit-learn` >= 1.2.2
  • `packaging` (for version comparison)
  • `Pillow` (PIL, for image loading)

Credentials

The following environment variables must be configured for distributed training:

  • `RANK`: Global process rank (set by SLURM/launcher)
  • `LOCAL_RANK`: Local GPU rank on the node
  • `WORLD_SIZE`: Total number of processes
  • `MASTER_ADDR`: Address of the rank-0 node
  • `MASTER_PORT`: Port for distributed communication
  • `LAUNCHER`: Set to `slurm` or `torchrun` (default: `slurm`)
  • `TOKENIZERS_PARALLELISM`: Set to `true` (hardcoded in training scripts)

Optional:

  • `SLURM_PROCID`, `SLURM_NTASKS`, `SLURM_NODELIST`: For SLURM-managed clusters
  • `OMPI_COMM_WORLD_LOCAL_RANK`: For OpenMPI-managed clusters

Quick Install

# Install core dependencies
pip install torch>=2.0 torchvision>=0.15 \
    transformers==4.37.2 tokenizers==0.15.1 sentencepiece==0.1.99 \
    accelerate peft>=0.4.0 bitsandbytes==0.41.0 \
    deepspeed==0.13.5 einops einops-exts timm==0.9.12 \
    numpy scikit-learn>=1.2.2 packaging shortuuid

# Or install from the project directly
cd internvl_chat && pip install -e .

Code Evidence

Version validation from `modeling_internvl_chat.py:31-36,51`:

def version_cmp(v1, v2, op='eq'):
    import operator
    from packaging import version
    op_func = getattr(operator, op)
    return op_func(version.parse(v1), version.parse(v2))

# Line 51:
assert version_cmp(transformers.__version__, '4.37.0', 'ge')

CUDA device detection from `model/__init__.py:16`:

world_size = torch.cuda.device_count()

Distributed initialization from `dist_utils.py:48-51`:

num_gpus = torch.cuda.device_count()
torch.cuda.set_device(rank % num_gpus)
deepspeed.init_distributed(dist_backend=backend)

SLURM environment parsing from `dist_utils.py:78-104`:

proc_id = int(os.environ['SLURM_PROCID'])
ntasks = int(os.environ['SLURM_NTASKS'])
node_list = os.environ['SLURM_NODELIST']

Dependencies declared in `pyproject.toml:15-23`:

dependencies = [
    "torch>=2", "torchvision>=0.15",
    "transformers==4.37.2", "tokenizers==0.15.1", "sentencepiece==0.1.99",
    "accelerate", "peft>=0.4.0", "bitsandbytes==0.41.0",
    "deepspeed==0.13.5", "einops", "einops-exts", "timm==0.9.12",
]

Common Errors

Error Message Cause Solution
`AssertionError` at `version_cmp(transformers.__version__, '4.37.0', 'ge')` Transformers version < 4.37.0 `pip install transformers==4.37.2`
`RuntimeError: NCCL error` Network misconfiguration in multi-node setup Verify `MASTER_ADDR`, `MASTER_PORT`, and firewall rules
`CUDA out of memory` Insufficient GPU VRAM for model size Use DeepSpeed ZeRO Stage 3, reduce batch size, or enable gradient checkpointing
`ModuleNotFoundError: No module named 'packaging'` Missing `packaging` library for version checks `pip install packaging`

Compatibility Notes

  • Transformers Version: Strictly pinned to 4.37.2. A known TODO in `configuration_internvl_chat.py:48` warns there may be bugs in transformers 4.44+.
  • SLURM vs Torchrun: Scripts default to SLURM launcher. Set `LAUNCHER=torchrun` for non-SLURM clusters.
  • Petrel Client: Optional cloud storage client (Ceph). Falls back to PIL for local image loading if not installed.
  • PIL Settings: Training scripts set `Image.MAX_IMAGE_PIXELS = None` and `ImageFile.LOAD_TRUNCATED_IMAGES = True` to handle large/corrupted images.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment