Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Facebookresearch Audiocraft AudioGen get pretrained

From Leeroopedia
Knowledge Sources
Domains Audio_Generation, Sound_Generation
Last Updated 2026-02-14 01:00 GMT

Overview

Concrete tool for loading pretrained AudioGen text-to-sound generation models and configuring their generation parameters.

Description

AudioGen provides a high-level user-facing API for text-to-sound generation. It wraps a compression model and a language model into a unified generation interface via BaseGenModel. The get_pretrained static method loads pretrained models from HuggingFace, and set_generation_params configures sampling strategy, temperature, duration, and classifier-free guidance.

Usage

Import this class when you want to generate environmental sounds from text descriptions using a pretrained AudioGen model.

Code Reference

Source Location

Signature

class AudioGen(BaseGenModel):
    @staticmethod
    def get_pretrained(name: str = 'facebook/audiogen-medium', device=None):
        """Load a pretrained AudioGen model."""

    def set_generation_params(self, use_sampling=True, top_k=250, top_p=0.0,
                              temperature=1.0, duration=10.0, cfg_coef=3.0,
                              two_step_cfg=False, extend_stride=2):
        """Configure generation parameters."""

Import

from audiocraft.models import AudioGen

I/O Contract

Inputs

Name Type Required Description
descriptions list[str] Yes Text descriptions for generation
name str No Pretrained model name (default 'facebook/audiogen-medium')

Outputs

Name Type Description
wav torch.Tensor Generated audio [B, C, T]

Usage Examples

from audiocraft.models import AudioGen

model = AudioGen.get_pretrained('facebook/audiogen-medium')
model.set_generation_params(duration=5.0)
wav = model.generate(['dog barking in a park', 'rain on a tin roof'])
# wav shape: [2, 1, 80000] at 16kHz

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment