Environment:Facebookresearch Audiocraft XFormers Memory Efficient Attention

Knowledge Sources	AudioCraft xformers
Domains	Infrastructure, Optimization, Deep_Learning
Last Updated	2026-02-13 23:00 GMT

Overview

Optional xformers dependency (< 0.0.23) for memory-efficient attention and gradient checkpointing in AudioCraft's transformer modules.

Description

AudioCraft supports two attention backends: PyTorch native (torch.nn.functional.scaled_dot_product_attention) and xformers (xformers.ops.memory_efficient_attention). The backend is selected globally via set_efficient_attention_backend(). While PyTorch native attention is the default, xformers is required for:

MAGNeT models: The MAGNeT loader explicitly sets the attention backend to xformers when memory_efficient=True.
Gradient checkpointing: The xformers_default and xformers_mm checkpointing strategies require the xformers.checkpoint_fairinternal module.
Custom attention masks: xformers supports efficient custom attention masks via LowerTriangularMask.

The xformers package requires CUDA compilation with matching architecture flags (e.g., TORCH_CUDA_ARCH_LIST='8.0' for Ampere GPUs).

Usage

Use this environment when running MAGNeT models with memory-efficient attention, or when enabling gradient checkpointing with xformers strategies in the StreamingTransformer. Also required when training large transformer models where PyTorch native attention runs out of memory.

System Requirements

Category	Requirement	Notes
Hardware	NVIDIA GPU with CUDA	Required for xformers compilation
CUDA Architecture	Compute capability 6.0+	Set via `TORCH_CUDA_ARCH_LIST` env var during build
PyTorch	2.1.0	Must match xformers version

Dependencies

Python Packages

xformers < 0.0.23 (from requirements.txt)
CI pins xformers == 0.0.22.post7

Credentials

No credentials required.

Quick Install

# Standard install (pre-built wheel)
pip install xformers==0.0.22.post7

# Build from source (if pre-built wheel unavailable)
FORCE_CUDA=1 TORCH_CUDA_ARCH_LIST='8.0' \
  pip install -U git+https://github.com/facebookresearch/xformers.git#egg=xformers

Code Evidence

Attention backend selection from audiocraft/modules/transformer.py:31-35:

def set_efficient_attention_backend(backend: str = 'torch'):
    global _efficient_attention_backend
    assert _efficient_attention_backend in ['xformers', 'torch']
    _efficient_attention_backend = backend

xformers import verification from audiocraft/modules/transformer.py:727-737:

def _verify_xformers_memory_efficient_compat():
    try:
        from xformers.ops import memory_efficient_attention, LowerTriangularMask
    except ImportError:
        raise ImportError(
            "xformers is not installed. Please install it and try again.\n"
            "To install on AWS and Azure, run \n"
            "FORCE_CUDA=1 TORCH_CUDA_ARCH_LIST='8.0'\\\n"
            "pip install -U git+https://git@github.com/fairinternal/xformers.git#egg=xformers\n"
        )

xformers gradient checkpointing verification from audiocraft/modules/transformer.py:741-751:

def _verify_xformers_internal_compat():
    try:
        from xformers.checkpoint_fairinternal import checkpoint, _get_default_policy
    except ImportError:
        raise ImportError(
            "Francisco's fairinternal xformers is not installed..."
            "FORCE_CUDA=1 TORCH_CUDA_ARCH_LIST='6.0;7.0'\\\n"
        )

Backend-specific attention dispatch from audiocraft/modules/transformer.py:402-416:

if self.memory_efficient:
    p = self.dropout if self.training else 0
    if _efficient_attention_backend == 'torch':
        x = torch.nn.functional.scaled_dot_product_attention(
            q, k, v, is_causal=attn_mask is not None, dropout_p=p)
    else:
        x = ops.memory_efficient_attention(q, k, v, attn_mask, p=p)

MAGNeT forcing xformers backend from audiocraft/models/loaders.py:148-149:

if cfg.transformer_lm.memory_efficient:
    set_efficient_attention_backend("xformers")

Common Errors

Error Message	Cause	Solution
`ImportError: xformers is not installed`	xformers package missing	`pip install xformers==0.0.22.post7`
`ImportError: Francisco's fairinternal xformers is not installed`	Using xformers checkpointing without internal build	Build xformers from source with `FORCE_CUDA=1`
`CUDA error: no kernel image is available`	xformers compiled for wrong GPU architecture	Rebuild with correct `TORCH_CUDA_ARCH_LIST` (e.g., '8.0' for A100)

Compatibility Notes

PyTorch native attention (default): Works without xformers; uses torch.nn.functional.scaled_dot_product_attention.
MAGNeT models: Explicitly require xformers backend when memory_efficient=True is set in config.
Gradient checkpointing: torch strategy uses PyTorch native; xformers_default and xformers_mm require xformers internal module.
Tensor layout: xformers uses time dimension at index 1; PyTorch native uses index 2 when memory_efficient=True.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment