Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Deepspeedai DeepSpeed CPU Environment

From Leeroopedia


Knowledge Sources
Domains Infrastructure, CPU_Optimization, SIMD
Last Updated 2026-02-09 00:00 GMT

Overview

CPU compute environment for DeepSpeed's SIMD-optimized operators, communication backends, and CPU offloading operations.

Description

This environment provides the CPU compute context required by DeepSpeed's native C++ operators that run on the CPU. These include SIMD-optimized implementations of Adam, AdamW, Adagrad, and Lion optimizers, shared memory (SHM) based allreduce for inter-process communication, and the OneCCL communication backend. The CPU environment requires a modern x86-64 processor with AVX2 or AVX-512 instruction set support, a C++14-compatible compiler for JIT compilation, and appropriate threading libraries.

The SIMD abstraction layer (`csrc/includes/simd.h`) automatically selects the optimal instruction set at compile time: AVX-512 when available, falling back to AVX2. For ARM architectures, NEON intrinsics are used as a fallback.

Usage

Use this environment when running DeepSpeed CPU-offloaded training (ZeRO-Offload), CPU-based distributed training with OneCCL, or when using DeepSpeed's fused CPU optimizers for parameter updates during offloading.

System Requirements

Category Requirement Notes
OS Linux Primary platform; Windows has limited support
CPU x86-64 with AVX2 or AVX-512 AVX-512 preferred for best performance; ARM with NEON also supported
Compiler GCC >= 7.0 or compatible C++14 compiler Required for JIT compilation of CPU ops
Shared Memory /dev/shm >= 512MB Required for SHM-based allreduce; Docker needs `--shm-size`
Threading OpenMP support Used for parallel SIMD operations across CPU cores

Dependencies

System Packages

  • `gcc` / `g++` (C++14 support)
  • `libomp-dev` (OpenMP for threading)
  • `ninja` (optional, for faster JIT compilation)

Python Packages

  • `torch` (CPU build sufficient)
  • `deepspeed`

Optional Packages

  • `oneccl_bind_pt` (Intel OneCCL bindings for PyTorch) - for CCL backend
  • `intel_extension_for_pytorch` (IPEX) - for enhanced Intel CPU performance

Credentials

The following environment variables affect CPU operations:

  • `DS_ACCELERATOR=cpu`: Force CPU accelerator backend
  • `OMP_NUM_THREADS`: Control OpenMP thread count for SIMD operations
  • `CCL_WORKER_COUNT`: Number of OneCCL worker threads
  • `KMP_AFFINITY`: Intel thread affinity settings

Quick Install

# Ensure compiler is available
sudo apt-get install gcc g++ libomp-dev

# Install DeepSpeed (CPU ops are JIT compiled)
pip install deepspeed

# Verify CPU op support
ds_report

Code Evidence

SIMD abstraction from `csrc/includes/simd.h`:

#if defined(__AVX512__)
#define SIMD_WIDTH 16
#define SIMD_LOAD(x) _mm512_load_ps(x)
#define SIMD_STORE(x, y) _mm512_store_ps(x, y)
#elif defined(__AVX256__)
#define SIMD_WIDTH 8
#define SIMD_LOAD(x) _mm256_load_ps(x)
#define SIMD_STORE(x, y) _mm256_store_ps(x, y)
#endif

Common Errors

Error Message Cause Solution
`cpu_adam not found` CPU Adam op not compiled Ensure gcc/g++ is installed; run `ds_report` to check
`AVX instruction not supported` CPU lacks required SIMD instructions Requires x86-64 with AVX2 minimum
`/dev/shm too small` Insufficient shared memory for SHM allreduce Use `--shm-size='1gb'` in Docker
`OneCCL not found` oneccl_bind_pt not installed `pip install oneccl_bind_pt` for Intel CCL support

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment