Implementation:Norrrrrrr lyn WAInjectBench force single gpu

Knowledge Sources	WAInjectBench
Domains	GPU_Computing, Deep_Learning
Last Updated	2026-02-14 16:00 GMT

Overview

Concrete tool for consolidating the LLaVA model onto a single GPU with hook removal and LoRA alignment, provided by the WAInjectBench train/llava-ft module.

Description

The force_single_gpu function in train/llava-ft.py performs four operations:

Calls remove_accelerate_hooks(model.model) to strip HuggingFace accelerate dispatch hooks
Moves model.model to cuda:{gpu_id}
Calls align_lora_child_modules_devices(model.model) to ensure LoRA A/B matrices match their base weight devices
Clears hf_device_map attribute

The caller in main() wraps this in a try/except for OOM, falling back to try_redispatch_auto which uses accelerate's infer_auto_device_map for multi-GPU placement.

Usage

Called from main() after LoRA injection when device_mode="single" (the default).

Code Reference

Source Location

Repository: WAInjectBench
File: train/llava-ft.py (L170-192)

Signature

def force_single_gpu(model: nn.Module, gpu_id: int):
    dev = torch.device(f"cuda:{gpu_id}")
    torch.cuda.set_device(dev)

    remove_accelerate_hooks(model.model)

    model.model.to(dev)

    align_lora_child_modules_devices(model.model)

    if hasattr(model.model, "hf_device_map"):
        try:
            delattr(model.model, "hf_device_map")
        except Exception:
            model.model.hf_device_map = {}

    head_dev = next(model.parameters()).device
    if head_dev != dev:
        model.to(dev)

    print(f"[INFO] Forced the whole model to {dev} (single GPU mode).")

Import

import torch
import torch.nn as nn

I/O Contract

Inputs

Name	Type	Required	Description
model	nn.Module	Yes	LlavaYesnoToken model with LoRA adapters
gpu_id	int	Yes	Target GPU ID (default 0, from --gpu_id)

Outputs

Name	Type	Description
model (in-place)	nn.Module	Model fully placed on cuda:{gpu_id} with hooks removed and LoRA aligned

Usage Examples

Placing Model on GPU

model = LlavaYesnoToken("llava-hf/llava-1.5-7b-hf", dtype=torch.bfloat16)
model = try_wrap_lora(model, lora_r=8, lora_alpha=32, lora_dropout=0.05)

try:
    force_single_gpu(model, gpu_id=0)
except RuntimeError as e:
    if "out of memory" in str(e).lower():
        try_redispatch_auto(model)  # Fallback to multi-GPU
    else:
        raise

Related Pages

Implements Principle

Principle:Norrrrrrr_lyn_WAInjectBench_GPU_Device_Placement

Requires Environment

Environment:Norrrrrrr_lyn_WAInjectBench_Conda_Python_39_CUDA_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment