Heuristic:Open compass VLMEvalKit WORLD SIZE Unset For Model Building

Knowledge Sources	VLMEvalKit Transformers TP Issue
Domains	Distributed_Training, Debugging
Last Updated	2026-02-14 01:30 GMT

Overview

Critical workaround: temporarily unset `WORLD_SIZE` before building VLM models to prevent HuggingFace Transformers from automatically enabling tensor parallelism, which conflicts with VLMEvalKit's multi-instance parallelism strategy.

Description

Starting with Transformers 4.50+, when `device_map='auto'` is used with a `torchrun` launcher, HuggingFace Transformers automatically detects `WORLD_SIZE` and enables Tensor Parallelism (TP). However, VLMEvalKit uses `torchrun` to run multiple independent model instances (one per GPU), not to shard a single model across GPUs. This mismatch causes compatibility failures. The workaround is to temporarily remove `WORLD_SIZE` from the environment before calling the model constructor, then restore it afterward.

Usage

Apply this heuristic whenever building local VLM models in a multi-GPU torchrun environment. This affects the `infer_data()` function in both `vlmeval/inference.py` and `vlmeval/inference_video.py`, as well as `build_model_from_config()` in `run.py`.

The Insight (Rule of Thumb)

Action: Before calling `supported_VLM[model_name]()`, pop `WORLD_SIZE` from `os.environ`. Restore it after model creation.
Value: `ws_bak = os.environ.pop('WORLD_SIZE', None)` before build; `os.environ['WORLD_SIZE'] = ws_bak` after build.
Trade-off: None. This is a pure compatibility fix with no performance cost.
Compatibility: Required for Transformers >= 4.50 when using `torchrun`. Harmless on older versions.

Reasoning

VLMEvalKit's parallelism model is data parallelism via multiple independent processes: each `torchrun` process loads a complete model copy on its assigned GPU and processes a shard of the evaluation data. This is different from tensor parallelism where a single model is split across GPUs. When Transformers detects `WORLD_SIZE > 1` in the environment, it assumes tensor parallelism is desired and attempts to shard model layers, which fails because VLMEvalKit expects each process to have the complete model. The fix was added on 2025-06-05 based on the comment in the source code.

Code Evidence

From `vlmeval/inference.py:119-126`:

# (25.06.05) In newer version of transformers (after 4.50), with device_map='auto' and torchrun launcher,
# Transformers automatically adopt TP parallelism, which leads to compatibility problems with VLMEvalKit
# (In VLMEvalKit, we use torchrun to launch multiple model instances on a single node).
# To bypass this problem, we unset `WORLD_SIZE` before building the model to not use TP parallel.
ws_bak = os.environ.pop('WORLD_SIZE', None)
model = supported_VLM[model_name](**kwargs) if isinstance(model, str) else model
if ws_bak:
    os.environ['WORLD_SIZE'] = ws_bak

Same pattern in `vlmeval/inference_video.py:103-110`:

# (25.06.05) In newer version of transformers (after 4.50), with device_map='auto' and torchrun launcher,
# Transformers automatically adopt TP parallelism, which leads to compatibility problems with VLMEvalKit
ws_bak = os.environ.pop('WORLD_SIZE', None)
model = supported_VLM[model_name](**kwargs) if isinstance(model, str) else model
if ws_bak:
    os.environ['WORLD_SIZE'] = ws_bak

Same pattern in `run.py:52-73` (`build_model_from_config`):

def build_model_from_config(cfg, model_name, use_vllm=False):
    import vlmeval.api
    import vlmeval.vlm
    ws_bak = os.environ.pop('WORLD_SIZE', None)
    ...
    if ws_bak:
        os.environ['WORLD_SIZE'] = ws_bak
    return model

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment