Environment:FMInference FlexLLMGen NVMe Disk
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Storage |
| Last Updated | 2026-02-09 12:00 GMT |
Overview
NVMe SSD storage environment providing a high-bandwidth offload directory for FlexLLMGen's disk-tier tensor storage, required when offloading weights or KV cache to disk.
Description
FlexLLMGen's three-tier offloading system (GPU, CPU, disk) requires a fast local storage device when the disk tier is used. The system stores tensors as memory-mapped NumPy files (np.lib.format.open_memmap) in an offload directory. Asynchronous copy threads (default: 4) handle data movement between disk and GPU/CPU. For maximum throughput, NVMe SSDs are recommended. AWS instances use a single NVMe drive formatted as XFS, while GCP instances stripe 4 NVMe drives into a RAID-0 logical volume using LVM for higher aggregate bandwidth.
Usage
Use this environment when offloading any tensors to disk, i.e., when --percent allocates less than 100% to GPU+CPU combined for weights, cache, or activations. This is essential for running models like OPT-175B that exceed available GPU and CPU memory. The offload directory defaults to ~/flexllmgen_offload_dir.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux | Scripts target Ubuntu with xfs filesystem |
| Storage | NVMe SSD | AWS: /dev/nvme1n1; GCP: 4x NVMe (/dev/nvme0n1-4) |
| Filesystem | XFS | Both AWS and GCP scripts format as XFS |
| Disk Space | 50GB+ | OPT-175B weights alone are ~350GB; SSD used in benchmarks was 1.5TB |
| Permissions | Root (sudo) | Mount scripts require root for mkfs, mount, pvcreate, lvcreate |
Dependencies
System Packages
xfsprogs(formkfs -t xfs)lvm2(GCP only: forpvcreate,vgcreate,lvcreate)util-linux(formount)
Python Packages
numpy(fornp.lib.format.open_memmapmemory-mapped file I/O)
Credentials
No credentials required. The offload directory must be world-readable/writable (chmod a+rw).
Quick Install
# AWS: Mount single NVMe drive
sudo bash scripts/mount_nvme_aws.sh ~/
# GCP: Create RAID-0 from 4 NVMe drives and mount
sudo bash scripts/mount_nvme_gcp.sh ~/
Code Evidence
AWS NVMe mount script from scripts/mount_nvme_aws.sh:13-17:
mkfs -t xfs -f /dev/nvme1n1
rm -rf $1flexllmgen_offload_dir
mkdir $1flexllmgen_offload_dir
mount /dev/nvme1n1 $1flexllmgen_offload_dir
chmod a+rw $1flexllmgen_offload_dir
GCP RAID-0 NVMe striping from scripts/mount_nvme_gcp.sh:13-22:
pvcreate /dev/nvme0n1 /dev/nvme0n2 /dev/nvme0n3 /dev/nvme0n4
pvs
vgcreate striped_vol_group /dev/nvme0n1 /dev/nvme0n2 /dev/nvme0n3 /dev/nvme0n4
vgs
lvcreate -i 4 -I 1m -l 100%VG -nstriped_logical_volume striped_vol_group
lvs
mkdir -p $1flexllmgen_offload_dir
mkfs -t xfs -f /dev/striped_vol_group/striped_logical_volume
mount /dev/striped_vol_group/striped_logical_volume $1flexllmgen_offload_dir
chmod a+rw $1flexllmgen_offload_dir
Disk tensor storage using memory-mapped NumPy files in flexllmgen/pytorch_backend.py:656-661:
def allocate(self, shape, dtype, pin_memory=None, name=None):
name = name or TorchTensor.next_name()
path = os.path.join(self.path, name)
np.lib.format.open_memmap(path, mode="w+", shape=shape, dtype=dtype)
return TorchTensor(shape, np_dtype_to_torch_dtype[dtype],
path, self, name=name)
Async copy thread pool in flexllmgen/pytorch_backend.py:624-647:
class TorchDisk:
def __init__(self, path, mem_capacity=None, cuda_id=0, num_copy_threads=4):
self.path = os.path.abspath(os.path.expanduser(path))
...
self.copy_queue = queue.Queue()
self.copy_threads = [
threading.Thread(
target=copy_worker_func, args=(self.copy_queue, cuda_id)
) for _ in range(num_copy_threads)
]
Default offload directory from flexllmgen/flex_opt.py:1281:
parser.add_argument("--offload-dir", type=str, default="~/flexllmgen_offload_dir",
help="The directory to offload tensors. ")
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
AssertionError on os.path.isdir(self.path) |
Offload directory path exists but is not a directory | Remove the file at that path or choose a different --offload-dir
|
OSError: [Errno 28] No space left on device |
NVMe drive is full | Use a larger NVMe SSD or reduce model size |
| Permission denied on mount | Script not run as root | Run with sudo bash scripts/mount_nvme_aws.sh ~/
|
Compatibility Notes
- AWS vs GCP: AWS uses a single NVMe device; GCP stripes 4 NVMe devices into RAID-0 for higher aggregate bandwidth.
- RAID-0 stripe size: GCP script uses 1MB stripe size (
-I 1m) which is optimized for large sequential tensor reads/writes. - Offload directory name: Always
flexllmgen_offload_dirappended to the mount path argument. - No disk required for small models: If
--percentkeeps all weights and cache on GPU+CPU (e.g.,100 0 100 0 100 0), no NVMe setup is needed.