Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Open compass VLMEvalKit SKIP ERR For Graceful Inference

From Leeroopedia
Knowledge Sources
Domains Debugging, Optimization
Last Updated 2026-02-14 01:30 GMT

Overview

Error-tolerant inference mode using the `SKIP_ERR` environment variable to catch and log RuntimeErrors during model generation instead of crashing the entire evaluation run.

Description

When evaluating a VLM across thousands of samples, a single CUDA out-of-memory error or model inference failure can crash the entire run. Setting `SKIP_ERR=1` wraps each model generation call in a try/except block that catches `RuntimeError`, synchronizes CUDA state, logs the error, and records a failure message for that sample. This allows the evaluation to continue for remaining samples, with failed samples clearly marked in the output.

Usage

Use this heuristic when running large-scale evaluations where occasional failures are acceptable, or when debugging a model that intermittently fails on certain inputs (e.g., very large images, unusual aspect ratios). Do not use it when you need to understand why specific samples fail, as the error is caught and logged rather than raised.

The Insight (Rule of Thumb)

  • Action: Set `export SKIP_ERR=1` before running evaluation.
  • Value: Failed samples get a response string of `"Failed to obtain answer: {error_type} {error_message}"`.
  • Trade-off: Prevents crashes at the cost of missing predictions for some samples. Failed samples will show as incorrect in evaluation.
  • Compatibility: Works with both image inference (`vlmeval/inference.py`) and video inference (`vlmeval/inference_video.py`).

Reasoning

In production evaluation of 100+ benchmarks across multiple models, even well-tested models can fail on edge cases (extremely large images, malformed inputs, CUDA memory pressure). Without `SKIP_ERR`, a single failure after hours of computation forces a complete restart. The `torch.cuda.synchronize()` call after catching the error ensures CUDA state is clean before processing the next sample, preventing cascading failures.

Code Evidence

From `vlmeval/inference.py:157-168`:

# If `SKIP_ERR` flag is set, the model will skip the generation if error is encountered
if os.environ.get('SKIP_ERR', False) == '1':
    FAIL_MSG = 'Failed to obtain answer'
    try:
        response = model.generate(message=struct, dataset=dataset_name)
    except RuntimeError as err:
        torch.cuda.synchronize()
        warnings.warn(f'{type(err)} {str(err)}')
        response = f'{FAIL_MSG}: {type(err)} {str(err)}'
else:
    response = model.generate(message=struct, dataset=dataset_name)
torch.cuda.empty_cache()

Same pattern in `vlmeval/inference_video.py:181-192`:

# If `SKIP_ERR` flag is set, the model will skip the generation if error is encountered
if os.environ.get('SKIP_ERR', False) == '1':
    FAIL_MSG = 'Failed to obtain answer'
    try:
        response = model.generate(message=struct, dataset=dataset_name)
    except RuntimeError as err:
        torch.cuda.synchronize()
        warnings.error(f'{type(err)} {str(err)}')
        response = f'{FAIL_MSG}: {type(err)} {str(err)}'
else:
    response = model.generate(message=struct, dataset=dataset_name)
torch.cuda.empty_cache()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment