Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Deepspeedai DeepSpeed MPS Accelerator

From Leeroopedia


Knowledge Sources
Domains Accelerator, Apple Metal Backend
Last Updated 2026-02-09 00:00 GMT

Overview

Apple Metal Performance Shaders (MPS) backend providing limited DeepSpeed support on Apple Silicon Macs.

Description

The MPS_Accelerator class implements the DeepSpeedAccelerator interface for Apple Metal Performance Shaders on Apple Silicon. This is the most constrained accelerator implementation with notable limitations: no communication backend (single-device only), no stream/event support (returns None), no graph operations, no BF16 or FP16 support (only torch.float), all tensor type properties return None, and many memory management methods are no-ops. The device is always mps:0 with a count of 1. Uses host timers since MPS events are not supported. All op builders return NotImplementedBuilder, indicating no custom DeepSpeed ops are available. Despite limitations, it provides functional single-device training capability for development and experimentation on Apple hardware.

Usage

Use for development and small-scale experimentation on Apple Silicon Macs. Limited to single-device FP32 training. Not recommended for production distributed training scenarios.

Code Reference

Source Location

Signature

class MPS_Accelerator(DeepSpeedAccelerator):
    def __init__(self):
        self._name = "mps"
        self._communication_backend_name = None  # No distributed support
        self._compile_backend = "inductor"

    def is_synchronized_device(self):
        return False

    def use_host_timers(self):
        return True  # Event timers not supported

    def device_name(self, device_index=None):
        if device_index is None:
            return "mps"
        return f"mps:{device_index}"

    def device_count(self):
        return 1  # Always single device

    def current_device(self):
        return torch.device("mps", index=0)

    # Stream/Event support - all return None
    @property
    def Stream(self):
        return None

    @property
    def Event(self):
        return None

    def current_stream(self, device_index=None):
        return None

    # Limited precision support
    def is_bf16_supported(self):
        return False

    def is_fp16_supported(self):
        return False

    def supported_dtypes(self):
        return [torch.float]  # FP32 only

    # Tensor types - all return None
    @property
    def BFloat16Tensor(self):
        return None

    @property
    def HalfTensor(self):
        return None

    # No graph support
    def create_graph(self):
        return None

    def get_op_builder(self, class_name):
        from deepspeed.ops.op_builder.cpu import NotImplementedBuilder
        return NotImplementedBuilder

Import

from deepspeed.accelerator.mps_accelerator import MPS_Accelerator

I/O Contract

Inputs

Name Type Required Description
device_index int Optional Ignored (always device 0)
seed int Required Random seed for MPS RNG

Outputs

Name Type Description
device torch.device Always mps:0
device_count int Always 1
memory_allocated int Current allocated memory
communication_backend None No distributed support

Usage Examples

# Set MPS accelerator
import os
os.environ['DS_ACCELERATOR'] = 'mps'

from deepspeed.accelerator import get_accelerator
accelerator = get_accelerator()

print(f"Device: {accelerator.device_name()}")  # 'mps'
print(f"Backend: {accelerator.communication_backend_name()}")  # None

# Single device only
print(f"Device count: {accelerator.device_count()}")  # 1
print(f"Current device: {accelerator.current_device_name()}")  # 'mps:0'

# Limited precision support
print(f"FP16: {accelerator.is_fp16_supported()}")  # False
print(f"BF16: {accelerator.is_bf16_supported()}")  # False
print(f"Supported dtypes: {accelerator.supported_dtypes()}")  # [torch.float]

# Memory operations (limited)
allocated = accelerator.memory_allocated()
print(f"Memory allocated: {allocated}")

# No stream/event support
stream = accelerator.Stream  # None
event = accelerator.Event  # None

# No graph support
graph = accelerator.create_graph()  # None

# No custom ops
builder = accelerator.get_op_builder('TransformerBuilder')
print(f"Builder type: {builder.__name__}")  # 'NotImplementedBuilder'

# Basic synchronization works
accelerator.synchronize()
accelerator.empty_cache()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment