Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Vllm project Vllm Env Override

From Leeroopedia


Knowledge Sources
Domains Configuration, Runtime_Patching, PyTorch_Integration
Last Updated 2026-02-08 00:00 GMT

Overview

Sets critical environment variables and applies PyTorch monkeypatches at import time to configure torch inductor settings and work around known torch 2.9 bugs affecting vLLM.

Description

env_override.py is a critical initialization module that executes side effects at import time. It sets environment variables (PYTORCH_NVML_BASED_CUDA_CHECK, TORCHINDUCTOR_COMPILE_THREADS) and configures torch._inductor.config.compile_threads = 1 to prevent unintentional CUDA initialization and thread safety issues. Additionally, it monkeypatches three PyTorch 2.9.0 inductor methods to fix compilation bugs:

  • memory_plan_reuse_patched: Fixes a memory planning issue in PythonWrapperCodegen affecting piecewise compilation output correctness.
  • get_graph_partition_signature_patched: Fixes inductor partition signatures for operators with in-place mutations (e.g., vllm.unified_attention_with_output).
  • should_partition_patched: Prevents assertion errors when inductor nodes lack origin_node fields for custom partitioned ops.

A fourth patch (_patch_get_raw_stream_if_needed) works around a TorchInductor autotune bug in torch 2.9.0/2.9.1 where get_raw_stream() is used but not defined.

Usage

This module is imported early in vLLM's initialization chain (via vllm/__init__.py) to ensure patches take effect before any torch inductor compilation occurs. Developers generally do not interact with this file directly, but understanding its patches is important when debugging torch compilation issues or upgrading PyTorch versions.

Code Reference

Source Location

Signature

def memory_plan_reuse_patched(self) -> None: ...
def get_graph_partition_signature_patched(
    self, partitions, skip_cudagraphs: list[bool]
) -> list[GraphPartitionSignature]: ...
def should_partition_patched(self, node, should_log: bool = False) -> bool: ...
def _update_scheduler_patched(self) -> None: ...
def _patch_get_raw_stream_if_needed() -> None: ...

Import

# Automatically imported at vLLM startup via vllm/__init__.py
# Side effects execute at import time:
import vllm.env_override

I/O Contract

Inputs

Name Type Required Description
torch version runtime Yes The installed PyTorch version (patches are version-gated to torch 2.9.0)
torch._inductor module Yes PyTorch inductor module whose methods are monkeypatched

Outputs

Name Type Description
PYTORCH_NVML_BASED_CUDA_CHECK env var Set to "1" to avoid unintentional CUDA initialization from torch.cuda.is_available()
TORCHINDUCTOR_COMPILE_THREADS env var Set to "1" to prevent multi-threaded inductor compilation issues
torch._inductor.config.compile_threads int Set to 1 for single-threaded compilation
PythonWrapperCodegen.memory_plan_reuse method Patched method on torch 2.9.0 for correct memory planning
GraphLowering._update_scheduler method Patched method on torch 2.9.0 that installs should_partition and get_graph_partition_signature patches
builtins.get_raw_stream function Workaround for missing get_raw_stream in torch 2.9.0/2.9.1 autotune

Usage Examples

# This module is not used directly. It is imported for its side effects.
# In vllm/__init__.py:
import vllm.env_override  # noqa: F401

# After import, the following are set:
import os
assert os.environ["PYTORCH_NVML_BASED_CUDA_CHECK"] == "1"
assert os.environ["TORCHINDUCTOR_COMPILE_THREADS"] == "1"

# On torch 2.9.0, the inductor patches are applied:
# - PythonWrapperCodegen.memory_plan_reuse is patched
# - GraphLowering._update_scheduler is patched
# - builtins.get_raw_stream is defined (if CUDA available)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment