Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Alibaba ROLL Platform

From Leeroopedia


Knowledge Sources
Domains Hardware_Abstraction, Distributed_Computing
Last Updated 2026-02-07 20:00 GMT

Overview

Abstract base class for hardware platform abstraction, standardizing device operations and environment configuration across NVIDIA, AMD, Ascend, and other accelerator platforms.

Description

The Platform class provides a unified interface for hardware-specific operations required by the mcore_adapter distributed training framework. It abstracts differences between accelerator platforms (NVIDIA CUDA, AMD ROCm, Huawei Ascend NPU) behind a common API.

Class attributes define platform metadata:

  • device_name: Human-readable name (e.g., "NVIDIA", "AMD", "ASCEND")
  • device_type: PyTorch module name (e.g., "cuda", "npu")
  • dispatch_key: PyTorch dispatch key (e.g., "CUDA", "PrivateUse1")
  • ray_device_key: Ray accelerator key (e.g., "GPU", "NPU")
  • device_control_env_var: Visibility control variable (e.g., "CUDA_VISIBLE_DEVICES")
  • ray_experimental_noset: Ray experimental flag for device visibility
  • communication_backend: Distributed backend (e.g., "nccl", "hccl")

Lazy attribute delegation via __getattr__ (lines 70-92): When an attribute is not found on the Platform instance, it automatically delegates to torch.<device_type> (e.g., torch.cuda). This allows calling platform-specific PyTorch APIs like device_count(), set_device(), synchronize() transparently through the Platform instance.

Platform detection methods: is_cuda(), is_npu(), and is_rocm() return False by default; subclasses override the appropriate method to return True.

Abstract methods that subclasses must implement:

  • clear_cublas_workspaces: Release or reuse low-level library workspaces
  • set_allocator_settings: Configure platform-specific memory allocators
  • get_custom_env_vars: Return platform-specific environment variables
  • get_vllm_worker_class: Specify the vLLM Ray worker class
  • get_vllm_run_time_env_vars: Generate runtime env vars for vLLM execution

Utility methods:

  • update_env_vars_for_visible_devices: Sets device visibility env vars and Ray experimental flags
  • get_visible_gpus: Parses the visibility env var to return currently visible device IDs

Usage

Do not instantiate Platform directly. Instead, use the current_platform singleton from mcore_adapter.platforms which auto-detects the hardware platform. Subclass Platform when adding support for a new accelerator type. Access device-specific PyTorch APIs through the platform instance for cross-platform compatibility.

Code Reference

Source Location

Signature

class Platform:
    device_name: str
    device_type: str
    dispatch_key: str
    ray_device_key: str
    device_control_env_var: str
    ray_experimental_noset: str
    communication_backend: str

    def __getattr__(self, key: str) -> Any: ...

    @classmethod
    def is_cuda(cls) -> bool: ...
    @classmethod
    def is_npu(cls) -> bool: ...
    @classmethod
    def is_rocm(cls) -> bool: ...
    @classmethod
    def clear_cublas_workspaces(cls) -> None: ...
    @classmethod
    def set_allocator_settings(cls, env: str) -> None: ...
    @classmethod
    def get_custom_env_vars(cls) -> dict: ...
    @classmethod
    def update_env_vars_for_visible_devices(cls, env_vars: dict, gpu_ranks: list) -> None: ...
    @classmethod
    def get_visible_gpus(cls) -> list: ...
    @classmethod
    def get_vllm_worker_class(cls): ...
    @classmethod
    def get_vllm_run_time_env_vars(cls, gpu_rank: str) -> dict: ...

Import

from mcore_adapter.platforms.platform import Platform
# Or use the auto-detected singleton:
from mcore_adapter.platforms import current_platform

I/O Contract

Inputs

Name Type Required Description
key str Yes (for __getattr__) Attribute name to look up on torch.<device_type>
env str Yes (for set_allocator_settings) Allocator configuration string
env_vars dict Yes (for update_env_vars_for_visible_devices) Dictionary of environment variables to update
gpu_ranks list Yes (for update_env_vars_for_visible_devices) List of device IDs to make visible
gpu_rank str Yes (for get_vllm_run_time_env_vars) GPU rank for vLLM runtime configuration

Outputs

Name Type Description
(__getattr__) Any or None The requested attribute from torch.<device_type>, or None if not found
(is_cuda / is_npu / is_rocm) bool Platform identification flags (False by default)
(get_custom_env_vars) dict Platform-specific environment variable key-value pairs
(get_visible_gpus) list List of currently visible device ID strings
(get_vllm_worker_class) type The vLLM WorkerWrapper class for this platform
(get_vllm_run_time_env_vars) dict Runtime environment variables for vLLM execution

Usage Examples

from mcore_adapter.platforms import current_platform

# Access device count through lazy delegation
num_devices = current_platform.device_count()

# Check platform type
if current_platform.is_cuda():
    print("Running on NVIDIA CUDA")

# Get visible devices
visible = current_platform.get_visible_gpus()
print(f"Visible devices: {visible}")

# Set device visibility for a subprocess
env_vars = {}
current_platform.update_env_vars_for_visible_devices(env_vars, [0, 1])
# env_vars now contains {"CUDA_VISIBLE_DEVICES": "0,1", ...}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment