Implementation:Deepspeedai DeepSpeed SDAA Accelerator
| Knowledge Sources | |
|---|---|
| Domains | Accelerator, Tecorigin Backend |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Tecorigin SDAA (Smart Data Acceleration Architecture) accelerator backend enabling DeepSpeed training on Tecorigin hardware.
Description
The SDAA_Accelerator class implements the DeepSpeedAccelerator interface for Tecorigin SDAA AI accelerators. It wraps torch.sdaa APIs provided by the torch_sdaa extension and uses tccl (Tecorigin Collective Communication Library) as the communication backend with inductor as the compile backend. All standard device, memory, RNG, and stream/event operations delegate to torch.sdaa equivalents. FP16 is always supported while BF16 support is checked via torch.sdaa.is_bf16_supported(). Graph operations are not supported (returns None/noop contexts). Uses SDAA_VISIBLE_DEVICES for device visibility control and exports NCCL, LD_LIBRARY, and PATH environment variables. Op builders are lazily loaded from op_builder.sdaa using inspect.getmembers. The file is dual-licensed under Apache-2.0 (Microsoft) and BSD 3-Clause (Tecorigin).
Usage
Use when training on Tecorigin SDAA accelerators. Requires torch_sdaa to be installed. Set DS_ACCELERATOR=sdaa to explicitly select this backend.
Code Reference
Source Location
- Repository: DeepSpeed
- File: accelerator/sdaa_accelerator.py
Signature
class SDAA_Accelerator(DeepSpeedAccelerator):
def __init__(self):
self._name = 'sdaa'
self._communication_backend_name = 'tccl'
self._compile_backend = "inductor"
self.class_dict = None
def is_synchronized_device(self):
return False
def device_name(self, device_index=None):
if device_index is None:
return 'sdaa'
return f'sdaa:{device_index}'
def device(self, device_index=None):
return torch.sdaa.device(device_index)
def synchronize(self, device_index=None):
return torch.sdaa.synchronize(device_index)
def is_bf16_supported(self):
return torch.sdaa.is_bf16_supported()
def is_fp16_supported(self):
return True
def is_triton_supported(self):
return False
def create_graph(self):
return None
def capture_to_graph(self, graph, pool=None, stream=None):
from deepspeed.runtime.utils import noop_context
return noop_context()
def visible_devices_envs(self):
return ['SDAA_VISIBLE_DEVICES']
Import
from deepspeed.accelerator.sdaa_accelerator import SDAA_Accelerator
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| device_index | int | Optional | SDAA device index |
| seed | int | Required | Random seed for SDAA RNG |
Outputs
| Name | Type | Description |
|---|---|---|
| device | torch.device | SDAA device object |
| device_count | int | Number of SDAA devices |
| memory_bytes | int | SDAA memory in bytes |
| communication_backend | str | Always 'tccl' |
Usage Examples
# Set SDAA accelerator
import os
os.environ['DS_ACCELERATOR'] = 'sdaa'
from deepspeed.accelerator import get_accelerator
accelerator = get_accelerator()
print(f"Device: {accelerator.device_name()}") # 'sdaa'
print(f"Backend: {accelerator.communication_backend_name()}") # 'tccl'
# Device management
print(f"Device count: {accelerator.device_count()}")
accelerator.set_device(0)
print(f"Current device: {accelerator.current_device_name()}")
# Precision support
print(f"FP16: {accelerator.is_fp16_supported()}") # True
print(f"BF16: {accelerator.is_bf16_supported()}") # Depends on hardware
print(f"Triton: {accelerator.is_triton_supported()}") # False
# Memory operations
total = accelerator.total_memory(0)
allocated = accelerator.memory_allocated(0)
available = accelerator.available_memory(0)
print(f"Memory: {allocated}/{total} bytes")
# Streams
stream = accelerator.Stream()
with accelerator.stream(stream):
output = model(input_tensor)
# Note: Graph operations not supported
graph = accelerator.create_graph() # Returns None
print(f"Graphs supported: {graph is not None}") # False
Related Pages
- Abstract Accelerator - Base interface
- Real Accelerator - Accelerator selection
- MLU Accelerator - Cambricon alternative