Wan2.1

Wan2.1是阿里云最近发布的开源视频生成模型,具有显著的性能优势,能够在个人电脑上运行,支持多种视频生成任务。

模型版本

  • Wan2.1-I2V-14B:该模型专注于图像到视频的生成,支持720P的输出分辨率。

  • Wan2.1-T2V-14B:这是一个文本到视频的生成模型,能够生成高质量的视频,适合对生成效果有较高要求的用户。

  • Wan2.1-T2V-1.3B:这是一个较小的文本到视频模型,适合在资源有限的环境中使用,能够在8.2GB显存的消费级GPU上运行,生成480P的视频。

特点

  • 高性能:Wan2.1在多个基准测试中表现优异,VBench得分达到86.22%,超越了许多国内外知名视频生成模型,如Sora和Minimax。这一成绩得益于其采用的先进技术,如视频扩散变换器架构和高效的3D因果变分自编码器(VAE)模块。

  • 多种生成能力:该模型支持多种生成任务,包括文本到视频(T2V)、图像到视频(I2V)、视频编辑、文本生成图像及视频生成音频等,适应不同用户的需求。

  • 分辨率和效率:Wan2.1的不同版本支持480P和720P的输出分辨率。特别是T2V-1.3B模型,能够在消费级GPU上运行,生成5秒的480P视频仅需约四分钟,显著降低了硬件要求。

  • 多语言支持:Wan2.1是首个能够生成中英文文本的视频模型,增强了其在多语言环境下的应用潜力。

  • 创新的数据处理和训练策略:模型采用了六阶段的逐步训练方法,从低分辨率图像数据预训练到高分辨率视频数据的训练,确保在不同分辨率和复杂场景下的卓越表现。此外,模型在数据处理上实施了四步清洗过程,以确保训练数据的高质量和多样性。

应用场景

  • 影视创作:Wan2.1能够快速生成复杂场景和特效,适合用于科幻片、战争片等类型的影视制作,显著降低制作成本和时间。

  • 广告营销:该模型可以根据品牌特点生成创意广告内容,帮助企业提升市场宣传效果。

  • 个人创作:用户可以利用Wan2.1进行短视频内容生成、艺术创作辅助以及图片动画化等,满足个人创作者的需求。

  • 专业制作:在专业领域,Wan2.1可用于影视特效制作、广告创意设计以及教育资源的开发,提升内容的视觉效果和吸引力。

  • 教育与培训:该模型可以生成教育视频和培训材料,帮助教育机构制作生动的教学资源,增强学习体验。

  • 多媒体内容生成:Wan2.1支持文本到视频、图像到视频、视频编辑、文本生成图像及视频生成音频等多种任务,适用于多媒体内容的创作和编辑。

阿里巴巴的Wan2.1视频生成模型已于2025年2月25日正式开源。此次开源采用了Apache 2.0协议,所有推理代码和模型权重均已公开,开发者可以在GitHub、Hugging Face等平台上下载和体验该模型。

Wan2.1: Alibaba Cloud’s Newly Released Open-Source Video Generation Model

Wan2.1 is Alibaba Cloud’s latest open-source video generation model, offering significant performance advantages. It can run on personal computers and supports various video generation tasks.


Model Versions

  • Wan2.1-I2V-14B: Specializes in image-to-video (I2V) generation, supporting 720P output resolution.
  • Wan2.1-T2V-14B: A text-to-video (T2V) model capable of generating high-quality videos, suitable for users with demanding generation requirements.
  • Wan2.1-T2V-1.3B: A lighter text-to-video model, optimized for resource-limited environments. It can run on consumer-grade GPUs with 8.2GB VRAM, producing 480P videos.

Key Features

  1. High Performance

    • Wan2.1 achieves an outstanding VBench score of 86.22%, surpassing many renowned video generation models such as Sora and Minimax.
    • This performance is driven by advanced technologies, including a Video Diffusion Transformer architecture and an efficient 3D causal Variational Autoencoder (VAE) module.
  2. Versatile Video Generation Capabilities

    • Supports multiple generation tasks, including:
      • Text-to-video (T2V)
      • Image-to-video (I2V)
      • Video editing
      • Text-to-image
      • Video-to-audio generation
    • This flexibility allows it to cater to a wide range of user needs.
  3. Resolution and Efficiency

    • Supports 480P and 720P output resolutions.
    • The T2V-1.3B model can run on consumer GPUs and generates a 5-second 480P video in approximately four minutes, significantly lowering hardware requirements.
  4. Multi-Language Support

    • Wan2.1 is the first video generation model capable of producing videos with Chinese and English text, enhancing its usability in multilingual environments.
  5. Innovative Data Processing & Training Strategies

    • Implements a six-stage progressive training process, transitioning from low-resolution image pretraining to high-resolution video training, ensuring exceptional performance across various resolutions and complex scenarios.
    • Employs a four-step data filtering process to maintain high-quality and diverse training data.

Application Scenarios

  1. Film & Video Production

    • Wan2.1 enables fast generation of complex scenes and special effects, making it ideal for genres like science fiction and war films, significantly reducing production costs and time.
  2. Advertising & Marketing

    • The model can generate creative ad content tailored to brand identities, helping businesses enhance marketing effectiveness.
  3. Personal Content Creation

    • Individual creators can use Wan2.1 for short video production, artistic animation, and image-to-video transformations, catering to personal creative needs.
  4. Professional Media Production

    • In professional fields, Wan2.1 can be applied to film special effects, advertising design, and educational resource development, enhancing visual quality and engagement.
  5. Education & Training

    • Wan2.1 can generate educational videos and training materials, helping institutions create engaging teaching resources and improving learning experiences.
  6. Multimedia Content Generation

    • Supports text-to-video, image-to-video, video editing, text-to-image, and video-to-audio generation, making it suitable for a wide range of multimedia content creation and editing tasks.

Open-Source Release

Alibaba officially open-sourced the Wan2.1 video generation model on February 25, 2025, under the Apache 2.0 license.
All inference code and model weights have been publicly released, and developers can download and experiment with the model on GitHub, Hugging Face, and other platforms.

声明:沃图AIGC收录关于AI类别的工具产品,总结文章由AI原创编撰,任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系邮箱wt@wtaigc.com.