Environment:FMInference FlexLLMGen NVMe Disk

Knowledge Sources	FlexLLMGen AWS NVMe Instances
Domains	Infrastructure, Storage
Last Updated	2026-02-09 12:00 GMT

Overview

NVMe SSD storage environment providing a high-bandwidth offload directory for FlexLLMGen's disk-tier tensor storage, required when offloading weights or KV cache to disk.

Description

FlexLLMGen's three-tier offloading system (GPU, CPU, disk) requires a fast local storage device when the disk tier is used. The system stores tensors as memory-mapped NumPy files (np.lib.format.open_memmap) in an offload directory. Asynchronous copy threads (default: 4) handle data movement between disk and GPU/CPU. For maximum throughput, NVMe SSDs are recommended. AWS instances use a single NVMe drive formatted as XFS, while GCP instances stripe 4 NVMe drives into a RAID-0 logical volume using LVM for higher aggregate bandwidth.

Usage

Use this environment when offloading any tensors to disk, i.e., when --percent allocates less than 100% to GPU+CPU combined for weights, cache, or activations. This is essential for running models like OPT-175B that exceed available GPU and CPU memory. The offload directory defaults to ~/flexllmgen_offload_dir.

System Requirements

Category	Requirement	Notes
OS	Linux	Scripts target Ubuntu with xfs filesystem
Storage	NVMe SSD	AWS: /dev/nvme1n1; GCP: 4x NVMe (/dev/nvme0n1-4)
Filesystem	XFS	Both AWS and GCP scripts format as XFS
Disk Space	50GB+	OPT-175B weights alone are ~350GB; SSD used in benchmarks was 1.5TB
Permissions	Root (sudo)	Mount scripts require root for mkfs, mount, pvcreate, lvcreate

Dependencies

System Packages

xfsprogs (for mkfs -t xfs)
lvm2 (GCP only: for pvcreate, vgcreate, lvcreate)
util-linux (for mount)

Python Packages

numpy (for np.lib.format.open_memmap memory-mapped file I/O)

Credentials

No credentials required. The offload directory must be world-readable/writable (chmod a+rw).

Quick Install

# AWS: Mount single NVMe drive
sudo bash scripts/mount_nvme_aws.sh ~/

# GCP: Create RAID-0 from 4 NVMe drives and mount
sudo bash scripts/mount_nvme_gcp.sh ~/

Code Evidence

AWS NVMe mount script from scripts/mount_nvme_aws.sh:13-17:

mkfs -t xfs -f /dev/nvme1n1
rm -rf $1flexllmgen_offload_dir
mkdir $1flexllmgen_offload_dir
mount /dev/nvme1n1 $1flexllmgen_offload_dir
chmod a+rw $1flexllmgen_offload_dir

GCP RAID-0 NVMe striping from scripts/mount_nvme_gcp.sh:13-22:

pvcreate /dev/nvme0n1 /dev/nvme0n2 /dev/nvme0n3 /dev/nvme0n4
pvs
vgcreate striped_vol_group /dev/nvme0n1 /dev/nvme0n2 /dev/nvme0n3 /dev/nvme0n4
vgs
lvcreate -i 4 -I 1m -l 100%VG -nstriped_logical_volume striped_vol_group
lvs
mkdir -p $1flexllmgen_offload_dir
mkfs -t xfs -f /dev/striped_vol_group/striped_logical_volume
mount /dev/striped_vol_group/striped_logical_volume $1flexllmgen_offload_dir
chmod a+rw $1flexllmgen_offload_dir

Disk tensor storage using memory-mapped NumPy files in flexllmgen/pytorch_backend.py:656-661:

def allocate(self, shape, dtype, pin_memory=None, name=None):
    name = name or TorchTensor.next_name()
    path = os.path.join(self.path, name)
    np.lib.format.open_memmap(path, mode="w+", shape=shape, dtype=dtype)
    return TorchTensor(shape, np_dtype_to_torch_dtype[dtype],
                       path, self, name=name)

Async copy thread pool in flexllmgen/pytorch_backend.py:624-647:

class TorchDisk:
    def __init__(self, path, mem_capacity=None, cuda_id=0, num_copy_threads=4):
        self.path = os.path.abspath(os.path.expanduser(path))
        ...
        self.copy_queue = queue.Queue()
        self.copy_threads = [
            threading.Thread(
                target=copy_worker_func, args=(self.copy_queue, cuda_id)
            ) for _ in range(num_copy_threads)
        ]

Default offload directory from flexllmgen/flex_opt.py:1281:

parser.add_argument("--offload-dir", type=str, default="~/flexllmgen_offload_dir",
    help="The directory to offload tensors. ")

Common Errors

Error Message	Cause	Solution
`AssertionError` on `os.path.isdir(self.path)`	Offload directory path exists but is not a directory	Remove the file at that path or choose a different `--offload-dir`
`OSError: [Errno 28] No space left on device`	NVMe drive is full	Use a larger NVMe SSD or reduce model size
Permission denied on mount	Script not run as root	Run with `sudo bash scripts/mount_nvme_aws.sh ~/`

Compatibility Notes

AWS vs GCP: AWS uses a single NVMe device; GCP stripes 4 NVMe devices into RAID-0 for higher aggregate bandwidth.
RAID-0 stripe size: GCP script uses 1MB stripe size (-I 1m) which is optimized for large sequential tensor reads/writes.
Offload directory name: Always flexllmgen_offload_dir appended to the mount path argument.
No disk required for small models: If --percent keeps all weights and cache on GPU+CPU (e.g., 100 0 100 0 100 0), no NVMe setup is needed.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment