Implementation:Deepspeedai DeepSpeed MPS Accelerator
| Knowledge Sources | |
|---|---|
| Domains | Accelerator, Apple Metal Backend |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Apple Metal Performance Shaders (MPS) backend providing limited DeepSpeed support on Apple Silicon Macs.
Description
The MPS_Accelerator class implements the DeepSpeedAccelerator interface for Apple Metal Performance Shaders on Apple Silicon. This is the most constrained accelerator implementation with notable limitations: no communication backend (single-device only), no stream/event support (returns None), no graph operations, no BF16 or FP16 support (only torch.float), all tensor type properties return None, and many memory management methods are no-ops. The device is always mps:0 with a count of 1. Uses host timers since MPS events are not supported. All op builders return NotImplementedBuilder, indicating no custom DeepSpeed ops are available. Despite limitations, it provides functional single-device training capability for development and experimentation on Apple hardware.
Usage
Use for development and small-scale experimentation on Apple Silicon Macs. Limited to single-device FP32 training. Not recommended for production distributed training scenarios.
Code Reference
Source Location
- Repository: DeepSpeed
- File: accelerator/mps_accelerator.py
Signature
class MPS_Accelerator(DeepSpeedAccelerator):
def __init__(self):
self._name = "mps"
self._communication_backend_name = None # No distributed support
self._compile_backend = "inductor"
def is_synchronized_device(self):
return False
def use_host_timers(self):
return True # Event timers not supported
def device_name(self, device_index=None):
if device_index is None:
return "mps"
return f"mps:{device_index}"
def device_count(self):
return 1 # Always single device
def current_device(self):
return torch.device("mps", index=0)
# Stream/Event support - all return None
@property
def Stream(self):
return None
@property
def Event(self):
return None
def current_stream(self, device_index=None):
return None
# Limited precision support
def is_bf16_supported(self):
return False
def is_fp16_supported(self):
return False
def supported_dtypes(self):
return [torch.float] # FP32 only
# Tensor types - all return None
@property
def BFloat16Tensor(self):
return None
@property
def HalfTensor(self):
return None
# No graph support
def create_graph(self):
return None
def get_op_builder(self, class_name):
from deepspeed.ops.op_builder.cpu import NotImplementedBuilder
return NotImplementedBuilder
Import
from deepspeed.accelerator.mps_accelerator import MPS_Accelerator
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| device_index | int | Optional | Ignored (always device 0) |
| seed | int | Required | Random seed for MPS RNG |
Outputs
| Name | Type | Description |
|---|---|---|
| device | torch.device | Always mps:0 |
| device_count | int | Always 1 |
| memory_allocated | int | Current allocated memory |
| communication_backend | None | No distributed support |
Usage Examples
# Set MPS accelerator
import os
os.environ['DS_ACCELERATOR'] = 'mps'
from deepspeed.accelerator import get_accelerator
accelerator = get_accelerator()
print(f"Device: {accelerator.device_name()}") # 'mps'
print(f"Backend: {accelerator.communication_backend_name()}") # None
# Single device only
print(f"Device count: {accelerator.device_count()}") # 1
print(f"Current device: {accelerator.current_device_name()}") # 'mps:0'
# Limited precision support
print(f"FP16: {accelerator.is_fp16_supported()}") # False
print(f"BF16: {accelerator.is_bf16_supported()}") # False
print(f"Supported dtypes: {accelerator.supported_dtypes()}") # [torch.float]
# Memory operations (limited)
allocated = accelerator.memory_allocated()
print(f"Memory allocated: {allocated}")
# No stream/event support
stream = accelerator.Stream # None
event = accelerator.Event # None
# No graph support
graph = accelerator.create_graph() # None
# No custom ops
builder = accelerator.get_op_builder('TransformerBuilder')
print(f"Builder type: {builder.__name__}") # 'NotImplementedBuilder'
# Basic synchronization works
accelerator.synchronize()
accelerator.empty_cache()
Related Pages
- Abstract Accelerator - Base interface
- Real Accelerator - Accelerator selection
- CPU Accelerator - Alternative for development