Implementation:Deepspeedai DeepSpeed SDAA Accelerator

Knowledge Sources	DeepSpeed
Domains	Accelerator, Tecorigin Backend
Last Updated	2026-02-09 00:00 GMT

Overview

Tecorigin SDAA (Smart Data Acceleration Architecture) accelerator backend enabling DeepSpeed training on Tecorigin hardware.

Description

The SDAA_Accelerator class implements the DeepSpeedAccelerator interface for Tecorigin SDAA AI accelerators. It wraps torch.sdaa APIs provided by the torch_sdaa extension and uses tccl (Tecorigin Collective Communication Library) as the communication backend with inductor as the compile backend. All standard device, memory, RNG, and stream/event operations delegate to torch.sdaa equivalents. FP16 is always supported while BF16 support is checked via torch.sdaa.is_bf16_supported(). Graph operations are not supported (returns None/noop contexts). Uses SDAA_VISIBLE_DEVICES for device visibility control and exports NCCL, LD_LIBRARY, and PATH environment variables. Op builders are lazily loaded from op_builder.sdaa using inspect.getmembers. The file is dual-licensed under Apache-2.0 (Microsoft) and BSD 3-Clause (Tecorigin).

Usage

Use when training on Tecorigin SDAA accelerators. Requires torch_sdaa to be installed. Set DS_ACCELERATOR=sdaa to explicitly select this backend.

Code Reference

Source Location

Repository: DeepSpeed
File: accelerator/sdaa_accelerator.py

Signature

class SDAA_Accelerator(DeepSpeedAccelerator):
    def __init__(self):
        self._name = 'sdaa'
        self._communication_backend_name = 'tccl'
        self._compile_backend = "inductor"
        self.class_dict = None

    def is_synchronized_device(self):
        return False

    def device_name(self, device_index=None):
        if device_index is None:
            return 'sdaa'
        return f'sdaa:{device_index}'

    def device(self, device_index=None):
        return torch.sdaa.device(device_index)

    def synchronize(self, device_index=None):
        return torch.sdaa.synchronize(device_index)

    def is_bf16_supported(self):
        return torch.sdaa.is_bf16_supported()

    def is_fp16_supported(self):
        return True

    def is_triton_supported(self):
        return False

    def create_graph(self):
        return None

    def capture_to_graph(self, graph, pool=None, stream=None):
        from deepspeed.runtime.utils import noop_context
        return noop_context()

    def visible_devices_envs(self):
        return ['SDAA_VISIBLE_DEVICES']

Import

from deepspeed.accelerator.sdaa_accelerator import SDAA_Accelerator

I/O Contract

Inputs

Name	Type	Required	Description
device_index	int	Optional	SDAA device index
seed	int	Required	Random seed for SDAA RNG

Outputs

Name	Type	Description
device	torch.device	SDAA device object
device_count	int	Number of SDAA devices
memory_bytes	int	SDAA memory in bytes
communication_backend	str	Always 'tccl'

Usage Examples

# Set SDAA accelerator
import os
os.environ['DS_ACCELERATOR'] = 'sdaa'

from deepspeed.accelerator import get_accelerator
accelerator = get_accelerator()

print(f"Device: {accelerator.device_name()}")  # 'sdaa'
print(f"Backend: {accelerator.communication_backend_name()}")  # 'tccl'

# Device management
print(f"Device count: {accelerator.device_count()}")
accelerator.set_device(0)
print(f"Current device: {accelerator.current_device_name()}")

# Precision support
print(f"FP16: {accelerator.is_fp16_supported()}")  # True
print(f"BF16: {accelerator.is_bf16_supported()}")  # Depends on hardware
print(f"Triton: {accelerator.is_triton_supported()}")  # False

# Memory operations
total = accelerator.total_memory(0)
allocated = accelerator.memory_allocated(0)
available = accelerator.available_memory(0)
print(f"Memory: {allocated}/{total} bytes")

# Streams
stream = accelerator.Stream()
with accelerator.stream(stream):
    output = model(input_tensor)

# Note: Graph operations not supported
graph = accelerator.create_graph()  # Returns None
print(f"Graphs supported: {graph is not None}")  # False

Related Pages

Abstract Accelerator - Base interface
Real Accelerator - Accelerator selection
MLU Accelerator - Cambricon alternative

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment