Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:FMInference FlexLLMGen ExecutionEnv Create

From Leeroopedia


Field Value
Sources Repo: FlexLLMGen
Domains System_Initialization, Hardware_Abstraction
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for initializing the three-tier hardware environment provided by the FlexLLMGen library.

Description

ExecutionEnv.create() is a classmethod that instantiates device handles for the three-tier memory hierarchy:

  • TorchDevice("cuda:0") -- GPU device handle for CUDA operations.
  • TorchDevice("cpu") -- CPU device handle for system DRAM operations.
  • TorchDisk(offload_dir) -- Disk device handle for NVMe SSD-backed storage.
  • TorchMixedDevice([gpu, cpu, disk]) -- Combined device handle for tensors split across tiers.

The method returns a frozen dataclass with four attributes: .gpu, .cpu, .disk, and .mixed. Because the dataclass is frozen, device assignments are immutable after creation.

Usage

Call ExecutionEnv.create(offload_dir) at the start of any FlexLLMGen workflow, before creating Policy or OptLM. The offload_dir parameter should point to a directory on a fast NVMe SSD for optimal disk offloading performance.

Code Reference

Field Value
Repository FlexLLMGen
File flexllmgen/utils.py
Lines 34-49

Signature:

@dataclasses.dataclass(frozen=True)
class ExecutionEnv:
    gpu: Any = None
    cpu: Any = None
    disk: Any = None
    mixed: Any = None

    @classmethod
    def create(cls, offload_dir):
        from flexllmgen.pytorch_backend import TorchDevice, TorchDisk, TorchMixedDevice
        gpu = TorchDevice("cuda:0")
        cpu = TorchDevice("cpu")
        disk = TorchDisk(offload_dir)
        return cls(gpu=gpu, cpu=cpu, disk=disk, mixed=TorchMixedDevice([gpu, cpu, disk]))

Import:

from flexllmgen.utils import ExecutionEnv

I/O Contract

Inputs

Parameter Type Required Description
offload_dir str Yes Path to NVMe-mounted directory for disk offloading

Outputs

Output Type Description
ExecutionEnv frozen dataclass Container for all device handles
.gpu TorchDevice CUDA:0 device for GPU operations
.cpu TorchDevice CPU device for DRAM operations
.disk TorchDisk Disk device for NVMe offloading
.mixed TorchMixedDevice Mixed device for split tensors across all tiers

Usage Examples

Example 1: Basic environment initialization

from flexllmgen.utils import ExecutionEnv

env = ExecutionEnv.create("~/flexllmgen_offload_dir")

# env.gpu   - CUDA device for GPU operations
# env.cpu   - CPU device for CPU operations
# env.disk  - Disk device for NVMe offloading
# env.mixed - Mixed device for split tensors

Example 2: Full workflow with environment initialization

from flexllmgen.utils import ExecutionEnv
from flexllmgen.opt_config import get_opt_config
from flexllmgen.flex_opt import Policy, OptLM
from flexllmgen.compression import CompressionConfig

# Step 1: Initialize execution environment
env = ExecutionEnv.create("/mnt/nvme/flexllmgen_offload")

# Step 2: Configure policy
policy = Policy(
    gpu_batch_size=2,
    num_gpu_batches=4,
    w_gpu_percent=0,
    w_cpu_percent=50,
    cache_gpu_percent=0,
    cache_cpu_percent=50,
    act_gpu_percent=100,
    act_cpu_percent=0,
    overlap=True,
    sep_layer=False,
    pin_weight=True,
    cpu_cache_compute=False,
    attn_sparsity=1.0,
    compress_weight=False,
    comp_weight_config=CompressionConfig(num_bits=4, group_size=64, group_dim=0, symmetric=False, enabled=False),
    compress_cache=False,
    comp_cache_config=CompressionConfig(num_bits=4, group_size=64, group_dim=2, symmetric=False, enabled=False),
)

# Step 3: Resolve model config
opt_config = get_opt_config("facebook/opt-30b")

# Step 4: Create model with environment
model = OptLM(opt_config, env, "/path/to/opt-30b", policy)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment