Principle:Norrrrrrr lyn WAInjectBench Environment Setup
| Knowledge Sources | |
|---|---|
| Domains | Environment_Management, GPU_Computing |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
A reproducible environment provisioning strategy that ensures all dependencies, hardware drivers, and runtime configurations are consistently available before executing detection or training workloads.
Description
Environment Setup is the foundational step in any ML pipeline that guarantees deterministic execution. It involves two complementary actions: (1) creating a virtual environment with pinned package versions via a declarative specification file, and (2) configuring runtime variables such as GPU device visibility. Without a reproducible environment, experiments may produce inconsistent results due to version drift or hardware misconfiguration.
In the WAInjectBench project, a single environment.yml file specifies all Python packages (PyTorch, Transformers, sentence-transformers, open_clip, scikit-learn, spaCy, etc.) needed across both text and image detection pipelines, as well as training workflows.
Usage
Apply this principle at the very start of any workflow execution, before importing any ML libraries. It is the prerequisite for every other step in the text detection, image detection, embedding classifier training, and LLaVA fine-tuning workflows.
Theoretical Basis
Environment reproducibility rests on two pillars:
1. Declarative dependency specification: A manifest file (e.g., conda YAML, pip requirements) locks the exact package versions so that any machine can recreate the same software stack.
2. Hardware resource isolation: Setting CUDA_VISIBLE_DEVICES at process start restricts which GPUs a process can see, preventing contention in multi-user or multi-experiment setups.
# Pseudocode for environment setup
create_virtual_env(spec_file="environment.yml")
activate_env()
set_env_var("CUDA_VISIBLE_DEVICES", selected_gpu_id)