Implementation:Norrrrrrr lyn WAInjectBench force single gpu
Appearance
| Knowledge Sources | |
|---|---|
| Domains | GPU_Computing, Deep_Learning |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for consolidating the LLaVA model onto a single GPU with hook removal and LoRA alignment, provided by the WAInjectBench train/llava-ft module.
Description
The force_single_gpu function in train/llava-ft.py performs four operations:
- Calls
remove_accelerate_hooks(model.model)to strip HuggingFace accelerate dispatch hooks - Moves
model.modeltocuda:{gpu_id} - Calls
align_lora_child_modules_devices(model.model)to ensure LoRA A/B matrices match their base weight devices - Clears
hf_device_mapattribute
The caller in main() wraps this in a try/except for OOM, falling back to try_redispatch_auto which uses accelerate's infer_auto_device_map for multi-GPU placement.
Usage
Called from main() after LoRA injection when device_mode="single" (the default).
Code Reference
Source Location
- Repository: WAInjectBench
- File: train/llava-ft.py (L170-192)
Signature
def force_single_gpu(model: nn.Module, gpu_id: int):
dev = torch.device(f"cuda:{gpu_id}")
torch.cuda.set_device(dev)
remove_accelerate_hooks(model.model)
model.model.to(dev)
align_lora_child_modules_devices(model.model)
if hasattr(model.model, "hf_device_map"):
try:
delattr(model.model, "hf_device_map")
except Exception:
model.model.hf_device_map = {}
head_dev = next(model.parameters()).device
if head_dev != dev:
model.to(dev)
print(f"[INFO] Forced the whole model to {dev} (single GPU mode).")
Import
import torch
import torch.nn as nn
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | nn.Module | Yes | LlavaYesnoToken model with LoRA adapters |
| gpu_id | int | Yes | Target GPU ID (default 0, from --gpu_id) |
Outputs
| Name | Type | Description |
|---|---|---|
| model (in-place) | nn.Module | Model fully placed on cuda:{gpu_id} with hooks removed and LoRA aligned |
Usage Examples
Placing Model on GPU
model = LlavaYesnoToken("llava-hf/llava-1.5-7b-hf", dtype=torch.bfloat16)
model = try_wrap_lora(model, lora_r=8, lora_alpha=32, lora_dropout=0.05)
try:
force_single_gpu(model, gpu_id=0)
except RuntimeError as e:
if "out of memory" in str(e).lower():
try_redispatch_auto(model) # Fallback to multi-GPU
else:
raise
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment