Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Norrrrrrr lyn WAInjectBench force single gpu

From Leeroopedia
Knowledge Sources
Domains GPU_Computing, Deep_Learning
Last Updated 2026-02-14 16:00 GMT

Overview

Concrete tool for consolidating the LLaVA model onto a single GPU with hook removal and LoRA alignment, provided by the WAInjectBench train/llava-ft module.

Description

The force_single_gpu function in train/llava-ft.py performs four operations:

  1. Calls remove_accelerate_hooks(model.model) to strip HuggingFace accelerate dispatch hooks
  2. Moves model.model to cuda:{gpu_id}
  3. Calls align_lora_child_modules_devices(model.model) to ensure LoRA A/B matrices match their base weight devices
  4. Clears hf_device_map attribute

The caller in main() wraps this in a try/except for OOM, falling back to try_redispatch_auto which uses accelerate's infer_auto_device_map for multi-GPU placement.

Usage

Called from main() after LoRA injection when device_mode="single" (the default).

Code Reference

Source Location

Signature

def force_single_gpu(model: nn.Module, gpu_id: int):
    dev = torch.device(f"cuda:{gpu_id}")
    torch.cuda.set_device(dev)

    remove_accelerate_hooks(model.model)

    model.model.to(dev)

    align_lora_child_modules_devices(model.model)

    if hasattr(model.model, "hf_device_map"):
        try:
            delattr(model.model, "hf_device_map")
        except Exception:
            model.model.hf_device_map = {}

    head_dev = next(model.parameters()).device
    if head_dev != dev:
        model.to(dev)

    print(f"[INFO] Forced the whole model to {dev} (single GPU mode).")

Import

import torch
import torch.nn as nn

I/O Contract

Inputs

Name Type Required Description
model nn.Module Yes LlavaYesnoToken model with LoRA adapters
gpu_id int Yes Target GPU ID (default 0, from --gpu_id)

Outputs

Name Type Description
model (in-place) nn.Module Model fully placed on cuda:{gpu_id} with hooks removed and LoRA aligned

Usage Examples

Placing Model on GPU

model = LlavaYesnoToken("llava-hf/llava-1.5-7b-hf", dtype=torch.bfloat16)
model = try_wrap_lora(model, lora_r=8, lora_alpha=32, lora_dropout=0.05)

try:
    force_single_gpu(model, gpu_id=0)
except RuntimeError as e:
    if "out of memory" in str(e).lower():
        try_redispatch_auto(model)  # Fallback to multi-GPU
    else:
        raise

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment