Implementation:FMInference FlexLLMGen ExecutionEnv Create
| Field | Value |
|---|---|
| Sources | Repo: FlexLLMGen |
| Domains | System_Initialization, Hardware_Abstraction |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for initializing the three-tier hardware environment provided by the FlexLLMGen library.
Description
ExecutionEnv.create() is a classmethod that instantiates device handles for the three-tier memory hierarchy:
- TorchDevice("cuda:0") -- GPU device handle for CUDA operations.
- TorchDevice("cpu") -- CPU device handle for system DRAM operations.
- TorchDisk(offload_dir) -- Disk device handle for NVMe SSD-backed storage.
- TorchMixedDevice([gpu, cpu, disk]) -- Combined device handle for tensors split across tiers.
The method returns a frozen dataclass with four attributes: .gpu, .cpu, .disk, and .mixed. Because the dataclass is frozen, device assignments are immutable after creation.
Usage
Call ExecutionEnv.create(offload_dir) at the start of any FlexLLMGen workflow, before creating Policy or OptLM. The offload_dir parameter should point to a directory on a fast NVMe SSD for optimal disk offloading performance.
Code Reference
| Field | Value |
|---|---|
| Repository | FlexLLMGen |
| File | flexllmgen/utils.py |
| Lines | 34-49 |
Signature:
@dataclasses.dataclass(frozen=True)
class ExecutionEnv:
gpu: Any = None
cpu: Any = None
disk: Any = None
mixed: Any = None
@classmethod
def create(cls, offload_dir):
from flexllmgen.pytorch_backend import TorchDevice, TorchDisk, TorchMixedDevice
gpu = TorchDevice("cuda:0")
cpu = TorchDevice("cpu")
disk = TorchDisk(offload_dir)
return cls(gpu=gpu, cpu=cpu, disk=disk, mixed=TorchMixedDevice([gpu, cpu, disk]))
Import:
from flexllmgen.utils import ExecutionEnv
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
| offload_dir | str | Yes | Path to NVMe-mounted directory for disk offloading |
Outputs
| Output | Type | Description |
|---|---|---|
| ExecutionEnv | frozen dataclass | Container for all device handles |
| .gpu | TorchDevice | CUDA:0 device for GPU operations |
| .cpu | TorchDevice | CPU device for DRAM operations |
| .disk | TorchDisk | Disk device for NVMe offloading |
| .mixed | TorchMixedDevice | Mixed device for split tensors across all tiers |
Usage Examples
Example 1: Basic environment initialization
from flexllmgen.utils import ExecutionEnv
env = ExecutionEnv.create("~/flexllmgen_offload_dir")
# env.gpu - CUDA device for GPU operations
# env.cpu - CPU device for CPU operations
# env.disk - Disk device for NVMe offloading
# env.mixed - Mixed device for split tensors
Example 2: Full workflow with environment initialization
from flexllmgen.utils import ExecutionEnv
from flexllmgen.opt_config import get_opt_config
from flexllmgen.flex_opt import Policy, OptLM
from flexllmgen.compression import CompressionConfig
# Step 1: Initialize execution environment
env = ExecutionEnv.create("/mnt/nvme/flexllmgen_offload")
# Step 2: Configure policy
policy = Policy(
gpu_batch_size=2,
num_gpu_batches=4,
w_gpu_percent=0,
w_cpu_percent=50,
cache_gpu_percent=0,
cache_cpu_percent=50,
act_gpu_percent=100,
act_cpu_percent=0,
overlap=True,
sep_layer=False,
pin_weight=True,
cpu_cache_compute=False,
attn_sparsity=1.0,
compress_weight=False,
comp_weight_config=CompressionConfig(num_bits=4, group_size=64, group_dim=0, symmetric=False, enabled=False),
compress_cache=False,
comp_cache_config=CompressionConfig(num_bits=4, group_size=64, group_dim=2, symmetric=False, enabled=False),
)
# Step 3: Resolve model config
opt_config = get_opt_config("facebook/opt-30b")
# Step 4: Create model with environment
model = OptLM(opt_config, env, "/path/to/opt-30b", policy)