Environment:Facebookresearch Audiocraft XFormers Memory Efficient Attention
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Optimization, Deep_Learning |
| Last Updated | 2026-02-13 23:00 GMT |
Overview
Optional xformers dependency (< 0.0.23) for memory-efficient attention and gradient checkpointing in AudioCraft's transformer modules.
Description
AudioCraft supports two attention backends: PyTorch native (torch.nn.functional.scaled_dot_product_attention) and xformers (xformers.ops.memory_efficient_attention). The backend is selected globally via set_efficient_attention_backend(). While PyTorch native attention is the default, xformers is required for:
- MAGNeT models: The MAGNeT loader explicitly sets the attention backend to xformers when
memory_efficient=True. - Gradient checkpointing: The
xformers_defaultandxformers_mmcheckpointing strategies require thexformers.checkpoint_fairinternalmodule. - Custom attention masks: xformers supports efficient custom attention masks via
LowerTriangularMask.
The xformers package requires CUDA compilation with matching architecture flags (e.g., TORCH_CUDA_ARCH_LIST='8.0' for Ampere GPUs).
Usage
Use this environment when running MAGNeT models with memory-efficient attention, or when enabling gradient checkpointing with xformers strategies in the StreamingTransformer. Also required when training large transformer models where PyTorch native attention runs out of memory.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Hardware | NVIDIA GPU with CUDA | Required for xformers compilation |
| CUDA Architecture | Compute capability 6.0+ | Set via TORCH_CUDA_ARCH_LIST env var during build
|
| PyTorch | 2.1.0 | Must match xformers version |
Dependencies
Python Packages
xformers< 0.0.23 (from requirements.txt)- CI pins
xformers== 0.0.22.post7
Credentials
No credentials required.
Quick Install
# Standard install (pre-built wheel)
pip install xformers==0.0.22.post7
# Build from source (if pre-built wheel unavailable)
FORCE_CUDA=1 TORCH_CUDA_ARCH_LIST='8.0' \
pip install -U git+https://github.com/facebookresearch/xformers.git#egg=xformers
Code Evidence
Attention backend selection from audiocraft/modules/transformer.py:31-35:
def set_efficient_attention_backend(backend: str = 'torch'):
global _efficient_attention_backend
assert _efficient_attention_backend in ['xformers', 'torch']
_efficient_attention_backend = backend
xformers import verification from audiocraft/modules/transformer.py:727-737:
def _verify_xformers_memory_efficient_compat():
try:
from xformers.ops import memory_efficient_attention, LowerTriangularMask
except ImportError:
raise ImportError(
"xformers is not installed. Please install it and try again.\n"
"To install on AWS and Azure, run \n"
"FORCE_CUDA=1 TORCH_CUDA_ARCH_LIST='8.0'\\\n"
"pip install -U git+https://git@github.com/fairinternal/xformers.git#egg=xformers\n"
)
xformers gradient checkpointing verification from audiocraft/modules/transformer.py:741-751:
def _verify_xformers_internal_compat():
try:
from xformers.checkpoint_fairinternal import checkpoint, _get_default_policy
except ImportError:
raise ImportError(
"Francisco's fairinternal xformers is not installed..."
"FORCE_CUDA=1 TORCH_CUDA_ARCH_LIST='6.0;7.0'\\\n"
)
Backend-specific attention dispatch from audiocraft/modules/transformer.py:402-416:
if self.memory_efficient:
p = self.dropout if self.training else 0
if _efficient_attention_backend == 'torch':
x = torch.nn.functional.scaled_dot_product_attention(
q, k, v, is_causal=attn_mask is not None, dropout_p=p)
else:
x = ops.memory_efficient_attention(q, k, v, attn_mask, p=p)
MAGNeT forcing xformers backend from audiocraft/models/loaders.py:148-149:
if cfg.transformer_lm.memory_efficient:
set_efficient_attention_backend("xformers")
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
ImportError: xformers is not installed |
xformers package missing | pip install xformers==0.0.22.post7
|
ImportError: Francisco's fairinternal xformers is not installed |
Using xformers checkpointing without internal build | Build xformers from source with FORCE_CUDA=1
|
CUDA error: no kernel image is available |
xformers compiled for wrong GPU architecture | Rebuild with correct TORCH_CUDA_ARCH_LIST (e.g., '8.0' for A100)
|
Compatibility Notes
- PyTorch native attention (default): Works without xformers; uses
torch.nn.functional.scaled_dot_product_attention. - MAGNeT models: Explicitly require xformers backend when
memory_efficient=Trueis set in config. - Gradient checkpointing:
torchstrategy uses PyTorch native;xformers_defaultandxformers_mmrequire xformers internal module. - Tensor layout: xformers uses time dimension at index 1; PyTorch native uses index 2 when
memory_efficient=True.