Wan 2.1

Video model for cinematic outputs

Wan 2.1 Description

Overview:

Wan 2.1, developed by Alibaba's Wan AI team, is a state-of-the-art video foundation model designed for advanced generative video tasks. Supporting Text-to-Video (T2V), it incorporates groundbreaking innovations to deliver high-quality outputs with exceptional computational efficiency.

Key Features:

Visual text generation: Generates text in both Chinese and English within videos.
3D Variational Autoencoder (Wan-VAE): Encodes and decodes unlimited-length 1080P videos with temporal precision.
High-quality outputs: Produces visually dynamic and temporally consistent videos at resolutions of up to 720P.

Intended Use:

Wan 2.1 is designed for applications in:

Creative industries (video production).
Content generation for social media and marketing campaigns.
Automated workflows involving multimedia processing.

Language Support:

The model supports multilingual text generation, including Chinese and English.

Technical Details

Architecture:

Wan 2.1 is built on the diffusion transformer paradigm with several innovative features:

3D Variational Autoencoder (Wan-VAE): Enhances spatio-temporal compression and ensures temporal causality during video generation.
Video Diffusion DiT Framework: Uses Flow Matching with a T5 Encoder for text encoding and cross-attention layers embedded in transformer blocks.

Performance Metrics:

Wan 2.1 achieves an impressive 84.7% VBench score, excelling in dynamic scenes, spatial consistency, and aesthetics. It generates 1080p video at 30 FPS with realistic motion, thanks to its advanced space-time attention mechanism. As a leading open-source video generation model, it rivals proprietary alternatives like Sora, though they may outperform it in certain areas.

Usage

Code Samples

The model is available on the AI/ML API platform as "Wan 2.1" .

Params:

negative_prompt [str]: The negative prompt to use. Use it to address details that you don't want in the image. This could be colors, objects, scenery and even the small details (e.g. moustache, blurry, low resolution).
seed [int]: Random seed for reproducibility. If None, a random seed is chosen
aspect_ratio [9:16, 16:9]: Aspect ratio of the generated video
inference_steps [int]: Number of inference steps for sampling. Higher values give better quality but take longer.
guidance_scale [number]: Classifier-free guidance scale. Controls prompt adherence / creativity
shift [number]: Noise schedule shift parameter. Affects temporal dynamics
sampler ['unipc', 'dpm+']: The sampler to use for generation.
enable_safety_checker [boolean]: If set to true, the safety checker will be enabled.
enable_prompt_expansion [boolean]: Whether to enable prompt expansion.

To get the generated video

API Documentation

Detailed API Documentation is available here.

Ethical Guidelines

Alibaba emphasizes responsible usage of Wan 2.1 for ethical applications in content creation while discouraging misuse such as deepfake generation or inappropriate content creation.

Licensing

Wan 2.1 is licensed under Apache 2.0, allowing both commercial and research use with transparent terms.

‍

Get Wan 2.1 API here.

Try it now

The Best Growth Choice
for Enterprise

Get API Key

Wan 2.1

AI Playground

Our Clients' Voices

Wan 2.1

Wan 2.1 Description

Overview:

Key Features:

Intended Use:

Language Support:

Technical Details

Architecture:

Performance Metrics:

Usage

Code Samples

Params:

To get the generated video

API Documentation

Ethical Guidelines

Licensing

200+ AI Models

The Best Growth Choice
for Enterprise

Wan 2.1

AI Playground

Our Clients' Voices

Wan 2.1

Wan 2.1 Description

Overview:

Key Features:

Intended Use:

Language Support:

Technical Details

Architecture:

Performance Metrics:

Usage

Code Samples

Params:

To get the generated video

API Documentation

Ethical Guidelines

Licensing

200+ AI Models

The Best Growth Choice for Enterprise

The Best Growth Choice
for Enterprise