Implementation:Alibaba ROLL Platform
| Knowledge Sources | |
|---|---|
| Domains | Hardware_Abstraction, Distributed_Computing |
| Last Updated | 2026-02-07 20:00 GMT |
Overview
Abstract base class for hardware platform abstraction, standardizing device operations and environment configuration across NVIDIA, AMD, Ascend, and other accelerator platforms.
Description
The Platform class provides a unified interface for hardware-specific operations required by the mcore_adapter distributed training framework. It abstracts differences between accelerator platforms (NVIDIA CUDA, AMD ROCm, Huawei Ascend NPU) behind a common API.
Class attributes define platform metadata:
- device_name: Human-readable name (e.g., "NVIDIA", "AMD", "ASCEND")
- device_type: PyTorch module name (e.g., "cuda", "npu")
- dispatch_key: PyTorch dispatch key (e.g., "CUDA", "PrivateUse1")
- ray_device_key: Ray accelerator key (e.g., "GPU", "NPU")
- device_control_env_var: Visibility control variable (e.g., "CUDA_VISIBLE_DEVICES")
- ray_experimental_noset: Ray experimental flag for device visibility
- communication_backend: Distributed backend (e.g., "nccl", "hccl")
Lazy attribute delegation via __getattr__ (lines 70-92): When an attribute is not found on the Platform instance, it automatically delegates to torch.<device_type> (e.g., torch.cuda). This allows calling platform-specific PyTorch APIs like device_count(), set_device(), synchronize() transparently through the Platform instance.
Platform detection methods: is_cuda(), is_npu(), and is_rocm() return False by default; subclasses override the appropriate method to return True.
Abstract methods that subclasses must implement:
- clear_cublas_workspaces: Release or reuse low-level library workspaces
- set_allocator_settings: Configure platform-specific memory allocators
- get_custom_env_vars: Return platform-specific environment variables
- get_vllm_worker_class: Specify the vLLM Ray worker class
- get_vllm_run_time_env_vars: Generate runtime env vars for vLLM execution
Utility methods:
- update_env_vars_for_visible_devices: Sets device visibility env vars and Ray experimental flags
- get_visible_gpus: Parses the visibility env var to return currently visible device IDs
Usage
Do not instantiate Platform directly. Instead, use the current_platform singleton from mcore_adapter.platforms which auto-detects the hardware platform. Subclass Platform when adding support for a new accelerator type. Access device-specific PyTorch APIs through the platform instance for cross-platform compatibility.
Code Reference
Source Location
- Repository: Alibaba_ROLL
- File: mcore_adapter/src/mcore_adapter/platforms/platform.py
- Lines: 1-179
Signature
class Platform:
device_name: str
device_type: str
dispatch_key: str
ray_device_key: str
device_control_env_var: str
ray_experimental_noset: str
communication_backend: str
def __getattr__(self, key: str) -> Any: ...
@classmethod
def is_cuda(cls) -> bool: ...
@classmethod
def is_npu(cls) -> bool: ...
@classmethod
def is_rocm(cls) -> bool: ...
@classmethod
def clear_cublas_workspaces(cls) -> None: ...
@classmethod
def set_allocator_settings(cls, env: str) -> None: ...
@classmethod
def get_custom_env_vars(cls) -> dict: ...
@classmethod
def update_env_vars_for_visible_devices(cls, env_vars: dict, gpu_ranks: list) -> None: ...
@classmethod
def get_visible_gpus(cls) -> list: ...
@classmethod
def get_vllm_worker_class(cls): ...
@classmethod
def get_vllm_run_time_env_vars(cls, gpu_rank: str) -> dict: ...
Import
from mcore_adapter.platforms.platform import Platform
# Or use the auto-detected singleton:
from mcore_adapter.platforms import current_platform
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| key | str | Yes (for __getattr__) | Attribute name to look up on torch.<device_type> |
| env | str | Yes (for set_allocator_settings) | Allocator configuration string |
| env_vars | dict | Yes (for update_env_vars_for_visible_devices) | Dictionary of environment variables to update |
| gpu_ranks | list | Yes (for update_env_vars_for_visible_devices) | List of device IDs to make visible |
| gpu_rank | str | Yes (for get_vllm_run_time_env_vars) | GPU rank for vLLM runtime configuration |
Outputs
| Name | Type | Description |
|---|---|---|
| (__getattr__) | Any or None | The requested attribute from torch.<device_type>, or None if not found |
| (is_cuda / is_npu / is_rocm) | bool | Platform identification flags (False by default) |
| (get_custom_env_vars) | dict | Platform-specific environment variable key-value pairs |
| (get_visible_gpus) | list | List of currently visible device ID strings |
| (get_vllm_worker_class) | type | The vLLM WorkerWrapper class for this platform |
| (get_vllm_run_time_env_vars) | dict | Runtime environment variables for vLLM execution |
Usage Examples
from mcore_adapter.platforms import current_platform
# Access device count through lazy delegation
num_devices = current_platform.device_count()
# Check platform type
if current_platform.is_cuda():
print("Running on NVIDIA CUDA")
# Get visible devices
visible = current_platform.get_visible_gpus()
print(f"Visible devices: {visible}")
# Set device visibility for a subprocess
env_vars = {}
current_platform.update_env_vars_for_visible_devices(env_vars, [0, 1])
# env_vars now contains {"CUDA_VISIBLE_DEVICES": "0,1", ...}