Implementation:Alibaba ROLL Platform

Knowledge Sources	Alibaba_ROLL
Domains	Hardware_Abstraction, Distributed_Computing
Last Updated	2026-02-07 20:00 GMT

Overview

Abstract base class for hardware platform abstraction, standardizing device operations and environment configuration across NVIDIA, AMD, Ascend, and other accelerator platforms.

Description

The Platform class provides a unified interface for hardware-specific operations required by the mcore_adapter distributed training framework. It abstracts differences between accelerator platforms (NVIDIA CUDA, AMD ROCm, Huawei Ascend NPU) behind a common API.

Class attributes define platform metadata:

device_name: Human-readable name (e.g., "NVIDIA", "AMD", "ASCEND")
device_type: PyTorch module name (e.g., "cuda", "npu")
dispatch_key: PyTorch dispatch key (e.g., "CUDA", "PrivateUse1")
ray_device_key: Ray accelerator key (e.g., "GPU", "NPU")
device_control_env_var: Visibility control variable (e.g., "CUDA_VISIBLE_DEVICES")
ray_experimental_noset: Ray experimental flag for device visibility
communication_backend: Distributed backend (e.g., "nccl", "hccl")

Lazy attribute delegation via __getattr__ (lines 70-92): When an attribute is not found on the Platform instance, it automatically delegates to torch.<device_type> (e.g., torch.cuda). This allows calling platform-specific PyTorch APIs like device_count(), set_device(), synchronize() transparently through the Platform instance.

Platform detection methods: is_cuda(), is_npu(), and is_rocm() return False by default; subclasses override the appropriate method to return True.

Abstract methods that subclasses must implement:

clear_cublas_workspaces: Release or reuse low-level library workspaces
set_allocator_settings: Configure platform-specific memory allocators
get_custom_env_vars: Return platform-specific environment variables
get_vllm_worker_class: Specify the vLLM Ray worker class
get_vllm_run_time_env_vars: Generate runtime env vars for vLLM execution

Utility methods:

update_env_vars_for_visible_devices: Sets device visibility env vars and Ray experimental flags
get_visible_gpus: Parses the visibility env var to return currently visible device IDs

Usage

Do not instantiate Platform directly. Instead, use the current_platform singleton from mcore_adapter.platforms which auto-detects the hardware platform. Subclass Platform when adding support for a new accelerator type. Access device-specific PyTorch APIs through the platform instance for cross-platform compatibility.

Code Reference

Source Location

Repository: Alibaba_ROLL
File: mcore_adapter/src/mcore_adapter/platforms/platform.py
Lines: 1-179

Signature

class Platform:
    device_name: str
    device_type: str
    dispatch_key: str
    ray_device_key: str
    device_control_env_var: str
    ray_experimental_noset: str
    communication_backend: str

    def __getattr__(self, key: str) -> Any: ...

    @classmethod
    def is_cuda(cls) -> bool: ...
    @classmethod
    def is_npu(cls) -> bool: ...
    @classmethod
    def is_rocm(cls) -> bool: ...
    @classmethod
    def clear_cublas_workspaces(cls) -> None: ...
    @classmethod
    def set_allocator_settings(cls, env: str) -> None: ...
    @classmethod
    def get_custom_env_vars(cls) -> dict: ...
    @classmethod
    def update_env_vars_for_visible_devices(cls, env_vars: dict, gpu_ranks: list) -> None: ...
    @classmethod
    def get_visible_gpus(cls) -> list: ...
    @classmethod
    def get_vllm_worker_class(cls): ...
    @classmethod
    def get_vllm_run_time_env_vars(cls, gpu_rank: str) -> dict: ...

Import

from mcore_adapter.platforms.platform import Platform
# Or use the auto-detected singleton:
from mcore_adapter.platforms import current_platform

I/O Contract

Inputs

Name	Type	Required	Description
key	str	Yes (for __getattr__)	Attribute name to look up on torch.<device_type>
env	str	Yes (for set_allocator_settings)	Allocator configuration string
env_vars	dict	Yes (for update_env_vars_for_visible_devices)	Dictionary of environment variables to update
gpu_ranks	list	Yes (for update_env_vars_for_visible_devices)	List of device IDs to make visible
gpu_rank	str	Yes (for get_vllm_run_time_env_vars)	GPU rank for vLLM runtime configuration

Outputs

Name	Type	Description
(__getattr__)	Any or None	The requested attribute from torch.<device_type>, or None if not found
(is_cuda / is_npu / is_rocm)	bool	Platform identification flags (False by default)
(get_custom_env_vars)	dict	Platform-specific environment variable key-value pairs
(get_visible_gpus)	list	List of currently visible device ID strings
(get_vllm_worker_class)	type	The vLLM WorkerWrapper class for this platform
(get_vllm_run_time_env_vars)	dict	Runtime environment variables for vLLM execution

Usage Examples

from mcore_adapter.platforms import current_platform

# Access device count through lazy delegation
num_devices = current_platform.device_count()

# Check platform type
if current_platform.is_cuda():
    print("Running on NVIDIA CUDA")

# Get visible devices
visible = current_platform.get_visible_gpus()
print(f"Visible devices: {visible}")

# Set device visibility for a subprocess
env_vars = {}
current_platform.update_env_vars_for_visible_devices(env_vars, [0, 1])
# env_vars now contains {"CUDA_VISIBLE_DEVICES": "0,1", ...}

Related Pages

Environment:Alibaba_ROLL_Python_Runtime_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment