Implementation:FMInference FlexLLMGen ExecutionEnv Create

Field	Value
Sources	Repo: FlexLLMGen
Domains	System_Initialization, Hardware_Abstraction
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for initializing the three-tier hardware environment provided by the FlexLLMGen library.

Description

ExecutionEnv.create() is a classmethod that instantiates device handles for the three-tier memory hierarchy:

TorchDevice("cuda:0") -- GPU device handle for CUDA operations.
TorchDevice("cpu") -- CPU device handle for system DRAM operations.
TorchDisk(offload_dir) -- Disk device handle for NVMe SSD-backed storage.
TorchMixedDevice([gpu, cpu, disk]) -- Combined device handle for tensors split across tiers.

The method returns a frozen dataclass with four attributes: .gpu, .cpu, .disk, and .mixed. Because the dataclass is frozen, device assignments are immutable after creation.

Usage

Call ExecutionEnv.create(offload_dir) at the start of any FlexLLMGen workflow, before creating Policy or OptLM. The offload_dir parameter should point to a directory on a fast NVMe SSD for optimal disk offloading performance.

Code Reference

Field	Value
Repository	FlexLLMGen
File	flexllmgen/utils.py
Lines	34-49

Signature:

@dataclasses.dataclass(frozen=True)
class ExecutionEnv:
    gpu: Any = None
    cpu: Any = None
    disk: Any = None
    mixed: Any = None

    @classmethod
    def create(cls, offload_dir):
        from flexllmgen.pytorch_backend import TorchDevice, TorchDisk, TorchMixedDevice
        gpu = TorchDevice("cuda:0")
        cpu = TorchDevice("cpu")
        disk = TorchDisk(offload_dir)
        return cls(gpu=gpu, cpu=cpu, disk=disk, mixed=TorchMixedDevice([gpu, cpu, disk]))

Import:

from flexllmgen.utils import ExecutionEnv

I/O Contract

Inputs

Parameter	Type	Required	Description
offload_dir	str	Yes	Path to NVMe-mounted directory for disk offloading

Outputs

Output	Type	Description
ExecutionEnv	frozen dataclass	Container for all device handles
.gpu	TorchDevice	CUDA:0 device for GPU operations
.cpu	TorchDevice	CPU device for DRAM operations
.disk	TorchDisk	Disk device for NVMe offloading
.mixed	TorchMixedDevice	Mixed device for split tensors across all tiers

Usage Examples

Example 1: Basic environment initialization

from flexllmgen.utils import ExecutionEnv

env = ExecutionEnv.create("~/flexllmgen_offload_dir")

# env.gpu   - CUDA device for GPU operations
# env.cpu   - CPU device for CPU operations
# env.disk  - Disk device for NVMe offloading
# env.mixed - Mixed device for split tensors

Example 2: Full workflow with environment initialization

from flexllmgen.utils import ExecutionEnv
from flexllmgen.opt_config import get_opt_config
from flexllmgen.flex_opt import Policy, OptLM
from flexllmgen.compression import CompressionConfig

# Step 1: Initialize execution environment
env = ExecutionEnv.create("/mnt/nvme/flexllmgen_offload")

# Step 2: Configure policy
policy = Policy(
    gpu_batch_size=2,
    num_gpu_batches=4,
    w_gpu_percent=0,
    w_cpu_percent=50,
    cache_gpu_percent=0,
    cache_cpu_percent=50,
    act_gpu_percent=100,
    act_cpu_percent=0,
    overlap=True,
    sep_layer=False,
    pin_weight=True,
    cpu_cache_compute=False,
    attn_sparsity=1.0,
    compress_weight=False,
    comp_weight_config=CompressionConfig(num_bits=4, group_size=64, group_dim=0, symmetric=False, enabled=False),
    compress_cache=False,
    comp_cache_config=CompressionConfig(num_bits=4, group_size=64, group_dim=2, symmetric=False, enabled=False),
)

# Step 3: Resolve model config
opt_config = get_opt_config("facebook/opt-30b")

# Step 4: Create model with environment
model = OptLM(opt_config, env, "/path/to/opt-30b", policy)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment