Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:FMInference FlexLLMGen NVMe Disk

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Storage
Last Updated 2026-02-09 12:00 GMT

Overview

NVMe SSD storage environment providing a high-bandwidth offload directory for FlexLLMGen's disk-tier tensor storage, required when offloading weights or KV cache to disk.

Description

FlexLLMGen's three-tier offloading system (GPU, CPU, disk) requires a fast local storage device when the disk tier is used. The system stores tensors as memory-mapped NumPy files (np.lib.format.open_memmap) in an offload directory. Asynchronous copy threads (default: 4) handle data movement between disk and GPU/CPU. For maximum throughput, NVMe SSDs are recommended. AWS instances use a single NVMe drive formatted as XFS, while GCP instances stripe 4 NVMe drives into a RAID-0 logical volume using LVM for higher aggregate bandwidth.

Usage

Use this environment when offloading any tensors to disk, i.e., when --percent allocates less than 100% to GPU+CPU combined for weights, cache, or activations. This is essential for running models like OPT-175B that exceed available GPU and CPU memory. The offload directory defaults to ~/flexllmgen_offload_dir.

System Requirements

Category Requirement Notes
OS Linux Scripts target Ubuntu with xfs filesystem
Storage NVMe SSD AWS: /dev/nvme1n1; GCP: 4x NVMe (/dev/nvme0n1-4)
Filesystem XFS Both AWS and GCP scripts format as XFS
Disk Space 50GB+ OPT-175B weights alone are ~350GB; SSD used in benchmarks was 1.5TB
Permissions Root (sudo) Mount scripts require root for mkfs, mount, pvcreate, lvcreate

Dependencies

System Packages

  • xfsprogs (for mkfs -t xfs)
  • lvm2 (GCP only: for pvcreate, vgcreate, lvcreate)
  • util-linux (for mount)

Python Packages

  • numpy (for np.lib.format.open_memmap memory-mapped file I/O)

Credentials

No credentials required. The offload directory must be world-readable/writable (chmod a+rw).

Quick Install

# AWS: Mount single NVMe drive
sudo bash scripts/mount_nvme_aws.sh ~/

# GCP: Create RAID-0 from 4 NVMe drives and mount
sudo bash scripts/mount_nvme_gcp.sh ~/

Code Evidence

AWS NVMe mount script from scripts/mount_nvme_aws.sh:13-17:

mkfs -t xfs -f /dev/nvme1n1
rm -rf $1flexllmgen_offload_dir
mkdir $1flexllmgen_offload_dir
mount /dev/nvme1n1 $1flexllmgen_offload_dir
chmod a+rw $1flexllmgen_offload_dir

GCP RAID-0 NVMe striping from scripts/mount_nvme_gcp.sh:13-22:

pvcreate /dev/nvme0n1 /dev/nvme0n2 /dev/nvme0n3 /dev/nvme0n4
pvs
vgcreate striped_vol_group /dev/nvme0n1 /dev/nvme0n2 /dev/nvme0n3 /dev/nvme0n4
vgs
lvcreate -i 4 -I 1m -l 100%VG -nstriped_logical_volume striped_vol_group
lvs
mkdir -p $1flexllmgen_offload_dir
mkfs -t xfs -f /dev/striped_vol_group/striped_logical_volume
mount /dev/striped_vol_group/striped_logical_volume $1flexllmgen_offload_dir
chmod a+rw $1flexllmgen_offload_dir

Disk tensor storage using memory-mapped NumPy files in flexllmgen/pytorch_backend.py:656-661:

def allocate(self, shape, dtype, pin_memory=None, name=None):
    name = name or TorchTensor.next_name()
    path = os.path.join(self.path, name)
    np.lib.format.open_memmap(path, mode="w+", shape=shape, dtype=dtype)
    return TorchTensor(shape, np_dtype_to_torch_dtype[dtype],
                       path, self, name=name)

Async copy thread pool in flexllmgen/pytorch_backend.py:624-647:

class TorchDisk:
    def __init__(self, path, mem_capacity=None, cuda_id=0, num_copy_threads=4):
        self.path = os.path.abspath(os.path.expanduser(path))
        ...
        self.copy_queue = queue.Queue()
        self.copy_threads = [
            threading.Thread(
                target=copy_worker_func, args=(self.copy_queue, cuda_id)
            ) for _ in range(num_copy_threads)
        ]

Default offload directory from flexllmgen/flex_opt.py:1281:

parser.add_argument("--offload-dir", type=str, default="~/flexllmgen_offload_dir",
    help="The directory to offload tensors. ")

Common Errors

Error Message Cause Solution
AssertionError on os.path.isdir(self.path) Offload directory path exists but is not a directory Remove the file at that path or choose a different --offload-dir
OSError: [Errno 28] No space left on device NVMe drive is full Use a larger NVMe SSD or reduce model size
Permission denied on mount Script not run as root Run with sudo bash scripts/mount_nvme_aws.sh ~/

Compatibility Notes

  • AWS vs GCP: AWS uses a single NVMe device; GCP stripes 4 NVMe devices into RAID-0 for higher aggregate bandwidth.
  • RAID-0 stripe size: GCP script uses 1MB stripe size (-I 1m) which is optimized for large sequential tensor reads/writes.
  • Offload directory name: Always flexllmgen_offload_dir appended to the mount path argument.
  • No disk required for small models: If --percent keeps all weights and cache on GPU+CPU (e.g., 100 0 100 0 100 0), no NVMe setup is needed.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment