Stable Audio

Stable Audio 是一款由 Stability AI 开发的生成音频模型,旨在通过文本提示生成高质量的音频样本和音效。

Stable Audio Open 的功能

  1. 文本到音频生成:用户可以通过输入描述音乐元素(如乐器、节奏、旋律等)的文本,让模型生成对应的音频内容。
  2. 高质量音效和乐段生成:基于深度学习技术,能够生成高质量、逼真的音效和乐段,满足用户的不同需求。
  3. 支持多种乐器声音:可以生成包括钢琴、吉他、鼓点等在内的多种乐器声音,为用户提供了丰富的选择。
  4. 支持音效设计:除了乐器声音外,模型还能够生成环境声音、特殊音效等,适用于音效设计和游戏开发等领域。
  5. 可定制性:用户可以根据自身需求,通过微调模型参数或使用特定的文本描述,定制生成具有特定风格的音频片段。

Stable Audio 2.0 的功能

  1. 高质量音乐生成:用户可以通过文本描述或音频样本输入,生成长达 3 分钟、44.1kHz 的高保真音乐作品,支持多种音乐风格,如摇滚、爵士、电子、嘻哈等。
  2. 先进的技术架构:利用 Diffusion transformer (DiT) 技术,Stable Audio 2.0 能够将随机噪声逐步转换为结构化音频数据,识别并重现复杂的模式和关系,生成连贯且高质量的音乐。
  3. 高效的生成速度:相比于前一版本,Stable Audio 2.0 显著提升了音乐生成的效率,平均 1 分钟左右即可完成一首 3 分钟音乐作品的生成。
  4. 大规模数据集训练:模型使用了超过 80 万个音频文件和 1.95 万小时的音频数据进行训练,确保生成的音乐具有丰富的细节和真实感。
  5. 商业化应用支持:与知名音乐服务商 AudioSparx 合作,Stable Audio 2.0 生成的音乐可用于商业用途,为视频自媒体用户和商业广告制作提供了便利。
  6. 多样化的输出格式:生成的音乐支持多种格式下载,包括 MP3、WAV 和 Video,满足不同用户的需求。


  • 生成次数:每月最多生成 20 个音频内容。
  • 音频长度:每个音频内容最长 45 秒。
  • 使用限制:生成的音频不能用于商业用途。


Professional 版

  • 价格:每月 11.99 美元。
  • 生成次数:每月最多生成 500 个音频内容。
  • 音频长度:每个音频内容最长 90 秒。
  • 使用权限:生成的音频可以用于商业用途。


  • 快速生成乐器演奏片段:帮助音乐制作人快速生成各种乐器的演奏片段,如钢琴、吉他、鼓点等,加速创作过程。
  • 和声与旋律生成:通过文本描述生成和声和旋律,丰富音乐作品的层次和细节。


  • 环境音效:生成逼真的环境音效,如鸟叫、雨声、城市噪音等,适用于电影、动画和游戏等媒体。
  • 特殊效果音:生成各种特殊效果音,如爆炸声、魔法音效等,增强视听作品的表现力。


  • 角色音效:为游戏角色生成独特的音效,如脚步声、攻击音效等,提升游戏的沉浸感。
  • 场景音效:为游戏场景生成背景音效,如森林、海洋、城市等,增强游戏的氛围。


  • 背景音乐:快速生成与广告内容匹配的背景音乐,提升广告的吸引力和感染力。
  • 音效设计:为广告中的特定场景设计音效,增强广告的表现力。


  • 学术研究:用于音频合成、机器学习和音乐学等领域的研究,实验和分析生成的音频。
  • 教学工具:作为教学工具,帮助学生理解音频生成技术和音乐创作。


  • 音频品牌化:为广告和品牌创建独特的音效或音频标识,增强品牌识别度和认同感。
  • 音频标志:开发音频标志和品牌声音,提升品牌的市场影响力。

Stable Audio Open 是一个开源项目。用户可以自由下载、使用和修改该模型的代码和权重。这一开源策略使得研究人员和开发者能够深入研究和扩展模型的功能,推动音频生成技术的发展。

Stable Audio 2.0 并非完全开源。虽然它提供了一些开放的接口和工具供用户使用,但核心模型和某些高级功能是闭源的。这种策略通常是为了保护商业利益和技术专利,同时为付费用户提供更高质量的服务。

Stable Audio is an audio generation model developed by Stability AI, designed to produce high-quality audio samples and sound effects from text prompts.

Features of Stable Audio Open

  1. Text-to-Audio Generation:
    • Users can generate audio content by entering text descriptions of musical elements such as instruments, rhythm, melody, etc.
  2. High-Quality Sound Effects and Music Clips:
    • Using deep learning technology, Stable Audio can generate realistic, high-quality sound effects and musical segments to meet a variety of user needs.
  3. Support for Various Instrument Sounds:
    • The model can generate sounds of various instruments, including piano, guitar, drums, and more, providing a rich selection for users.
  4. Sound Design Support:
    • In addition to musical instrument sounds, the model can generate environmental sounds and special effects, making it suitable for sound design and game development.
  5. Customizability:
    • Users can fine-tune model parameters or use specific text descriptions to generate audio clips with a particular style according to their needs.

Features of Stable Audio 2.0

  1. High-Quality Music Generation:
    • Users can generate up to 3 minutes of 44.1kHz high-fidelity music through text descriptions or by inputting audio samples. It supports various music genres such as rock, jazz, electronic, and hip-hop.
  2. Advanced Technical Architecture:
    • Utilizing Diffusion Transformer (DiT) technology, Stable Audio 2.0 gradually converts random noise into structured audio data, recognizing and reproducing complex patterns and relationships to generate coherent, high-quality music.
  3. Efficient Generation Speed:
    • Compared to its previous version, Stable Audio 2.0 significantly improves music generation efficiency, completing a 3-minute music piece in about 1 minute.
  4. Extensive Dataset Training:
    • The model was trained on over 800,000 audio files and 19,500 hours of audio data, ensuring the generated music is rich in detail and realism.
  5. Commercial Use Support:
    • In collaboration with renowned music service provider AudioSparx, music generated by Stable Audio 2.0 can be used for commercial purposes, providing convenience for content creators and advertisers.
  6. Diverse Output Formats:
    • Generated music can be downloaded in various formats, including MP3, WAV, and Video, catering to different user needs.

Pricing Plans

  1. Free Version:
    • Generation Limit: Up to 20 audio files per month.
    • Audio Length: Each audio clip can be up to 45 seconds long.
    • Usage Restrictions: Generated audio cannot be used for commercial purposes.
  2. Paid Version (Professional Plan):
    • Price: $11.99 per month.
    • Generation Limit: Up to 500 audio files per month.
    • Audio Length: Each audio clip can be up to 90 seconds long.
    • Usage Rights: Generated audio can be used for commercial purposes.

Application Scenarios

  1. Music Creation:
    • Quick Instrumental Clips: Helps music producers quickly generate instrumental clips such as piano, guitar, and drum segments, speeding up the creative process.
    • Harmony and Melody Generation: Generates harmonies and melodies from text descriptions, adding depth and detail to musical works.
  2. Sound Design:
    • Environmental Sound Effects: Generates realistic environmental sounds like birdsong, rain, and city noise, useful for films, animations, and games.
    • Special Effects: Creates special effects sounds such as explosions or magical sound effects to enhance the audiovisual experience.
  3. Game Development:
    • Character Sound Effects: Generates unique sound effects for game characters, such as footsteps or attack sounds, enhancing immersion.
    • Scene Sound Effects: Creates background sounds for game scenes like forests, oceans, or cities, boosting the game’s atmosphere.
  4. Advertising Soundtracks:
    • Background Music: Quickly generates background music that matches the content of advertisements, increasing the appeal and impact of the ads.
    • Sound Design: Designs specific sound effects for scenes in ads to enhance their expressiveness.
  5. Education and Research:
    • Academic Research: Can be used in audio synthesis, machine learning, and musicology research to experiment and analyze generated audio.
    • Teaching Tool: Helps students understand audio generation technology and music creation by serving as a practical teaching tool.
  6. Business and Marketing:
    • Audio Branding: Creates unique sound effects or audio identities for ads and brands, enhancing brand recognition and loyalty.
    • Audio Logos: Develops audio logos and brand sounds, increasing the brand’s market influence.

Open Source Nature of Stable Audio Open

Stable Audio Open is an open-source project, allowing users to freely download, use, and modify the model’s code and weights. This open-source approach enables researchers and developers to explore and expand the model’s capabilities, advancing the development of audio generation technology.

Stable Audio 2.0 and Its Closed Source Components

While Stable Audio 2.0 offers some open APIs and tools for users, its core model and certain advanced features remain closed-source. This strategy is typically employed to protect commercial interests and technical patents while offering higher-quality services to paying customers.
