Implementation:Deepspeedai DeepSpeed Abstract Accelerator
| Knowledge Sources | |
|---|---|
| Domains | Accelerator, Hardware Abstraction |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Abstract base class defining the hardware accelerator interface that all DeepSpeed backends must implement.
Description
The DeepSpeedAccelerator abstract base class establishes a comprehensive contract for hardware accelerator integration in DeepSpeed. It uses Python's ABC (Abstract Base Class) module to define 90+ abstract methods organized into categories: device management, random number generation, streams and events, memory management, data type support, graph operations, tensor operations, op builders, and environment configuration. This interface enables DeepSpeed to support multiple hardware platforms (CUDA, CPU, XPU, NPU, HPU, MLU, MPS, SDAA) through a single unified API.
Usage
Use this class as the base when implementing a new hardware accelerator backend for DeepSpeed. All concrete accelerator implementations must inherit from this class and provide implementations for every abstract method.
Code Reference
Source Location
- Repository: DeepSpeed
- File: accelerator/abstract_accelerator.py
Signature
from abc import ABC
import abc
class DeepSpeedAccelerator(ABC):
def __init__(self):
self._name = None
self._communication_backend_name = None
self._compile_backend = None
# Device APIs
@abc.abstractmethod
def device_name(self, device_index): ...
@abc.abstractmethod
def set_device(self, device_index): ...
@abc.abstractmethod
def current_device(self): ...
@abc.abstractmethod
def device_count(self): ...
@abc.abstractmethod
def synchronize(self, device_index=None): ...
# RNG APIs
@abc.abstractmethod
def manual_seed(self, seed): ...
@abc.abstractmethod
def get_rng_state(self, device_index=None): ...
@abc.abstractmethod
def set_rng_state(self, new_state, device_index=None): ...
# Memory management
@abc.abstractmethod
def memory_allocated(self, device_index=None): ...
@abc.abstractmethod
def total_memory(self, device_index=None): ...
@abc.abstractmethod
def available_memory(self, device_index=None): ...
@abc.abstractmethod
def empty_cache(self): ...
# Data types
@abc.abstractmethod
def is_bf16_supported(self): ...
@abc.abstractmethod
def is_fp16_supported(self): ...
# Graph operations
@abc.abstractmethod
def create_graph(self): ...
@abc.abstractmethod
def capture_to_graph(self, graph, pool=None, stream=None): ...
@abc.abstractmethod
def replay_graph(self, graph): ...
# Op builder APIs
@abc.abstractmethod
def create_op_builder(self, class_name): ...
@abc.abstractmethod
def get_op_builder(self, class_name): ...
Import
from deepspeed.accelerator.abstract_accelerator import DeepSpeedAccelerator
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| device_index | int | Optional | Device index for multi-device operations |
| seed | int | Required | Random seed for RNG initialization |
| new_state | Tensor | Required | RNG state to restore |
| class_name | str | Required | Name of op builder class to instantiate |
Outputs
| Name | Type | Description |
|---|---|---|
| device | torch.device | PyTorch device object |
| memory_bytes | int | Memory allocation in bytes |
| is_supported | bool | Whether a feature is supported |
| op_builder | OpBuilder | Operation builder instance |
| graph | Graph | Computation graph object |
Usage Examples
# Implementing a custom accelerator
from deepspeed.accelerator.abstract_accelerator import DeepSpeedAccelerator
class MyCustomAccelerator(DeepSpeedAccelerator):
def __init__(self):
super().__init__()
self._name = 'custom'
self._communication_backend_name = 'nccl'
self._compile_backend = 'inductor'
def device_name(self, device_index=None):
if device_index is None:
return 'custom'
return f'custom:{device_index}'
def device_count(self):
return torch.custom.device_count()
def is_bf16_supported(self):
return True
# ... implement all other abstract methods ...
Related Pages
- Real Accelerator - Factory for accelerator selection
- CUDA Accelerator - NVIDIA GPU implementation
- CPU Accelerator - CPU implementation