--- license: apache-2.0 ---
๐ฅ๏ธ GitHub    |   ๐ค Hugging Face   |   ๐ค Model Scope   | ๐    ๏ฝ ๐
  
----
[**Index-AniSora:The Ultimate Open-Source Anime Video Generation Model**](http://arxiv.org/abs/2412.10255)
|.gif)|
|The scene shows two figures in red wedding clothes holding a red rope as they walk off into the distance|
|.gif)|
|The yellow-haired figure reaches out to touch the head of the kneeling figure, and the kneeling figure's body rises and falls as he gasps for breath.|
|.gif)|
|In the frame, a person sprints forward at high speed, their motion appearing slightly blurred from the velocity. |
| |
|In the frame, the character raises their arm, with gaseous currents visibly flowing along its surface. |
| |
|The old man's gaze locks onto the gemstone, his right hand subtly adjusting the magnifying glass as his lips moveโas if it holds the key to unraveling some ancient knowledge or secret. |
| |
|The man on the left presses his lips tightly together, his face etched with fury and resolve. Every line of his expression radiates both profound frustration and unshakable conviction. Meanwhile, the other man's jaw hangs openโpoised as if to erupt into a shout or impassioned declaration. |
| |
|The scene depicts an exploding rock, erupting in blinding light as shattered fragments blast outward in all directions. |
| |
**Temporal Control:**
| prmopt | first frame | mid frame| last frame | Video |
| --- | --- | --- | --- | --- |
|In this video we see a scene from the animated film Beauty and the Beast with Belle and the Beast. Belle, with long blonde hair, is standing in a room with large windows, looking out the window and talking to it. She is wearing a purple dress with a purple top...|
|
|
||
|In this video, a young woman with long blonde hair can be seen looking out from behind a car door at night. The car is parked under a starry sky with a full moon illuminating the scene. The woman appears to be in a state of worry, as evidenced by her facial expression and the way she grips the car door. |
| None |
||
|A cartoon cat is the central figure in this video, which appears to be in a state of mischief or curiosity. The cat's eyes are closed and its mouth is open, suggesting a moment of surprise or anticipation...|None | None |
||
**Spatial Control:**
| prmopt | first frame | motion mask | Video(with motion mask visualization) |
| --- | --- | --- | --- |
|In this vibrant underwater scene from the animated film Finding Nemo, Marlin and Nemo, two clownfish, talk near a large purple piece of coral...|
|
||
|Same as above| Same as above |
||
** More videos are available in: [Video Gallery](https://pwz4yo5eenw.feishu.cn/docx/XN9YdiOwCoqJuexLdCpcakSlnkg) **
## ๐ Evaluation
Evaluation results on Vbench:
| Method | Motion Smoothness | Motion Score | Aesthetic Quality | Imaging Quality | I2V Subject | I2V Background | Overall Consistency | Subject Consistency |
|--------------------------|-------------------|--------------|-------------------|-----------------|-------------|----------------|---------------------|---------------------|
| Opensora-Plan(V1.3) | 99.13 | 76.45 | 53.21 | 65.11 | 93.53 | 94.71 | 21.67 | 88.86 |
| Opensora(V1.2) | 98.78 | 73.62 | 54.30 | 68.44 | 93.15 | 91.09 | 22.68 | 87.71 |
| Vidu | 97.71 | **77.51** | 53.68 | 69.23 | 92.25 | 93.06 | 20.87 | 88.27 |
| Covideo(5B-V1) | 97.67 | 71.47 | **54.87** | 68.16 | 90.68 | 91.79 | 21.87 | 90.29 |
| MiniMax | 99.20 | 66.53 | 54.56 | **71.67** | 95.95 | **95.42** | 21.82 | 93.62 |
| **AniSora** | **99.34** | 45.59 | 54.31 | 70.58 | **97.52** | 95.04 | 21.15 | **96.99** |
| AniSora-K | 99.12 | 59.49 | 53.76 | 68.68 | 95.13 | 93.36 | 21.13 | 94.61 |
| AniSora-I | 99.31 | 54.96 | 54.67 | 68.98 | 94.16 | 92.38 | 20.47 | 95.75 |
| GT | 98.72 | 56.05 | 52.70 | 70.50 | 96.02 | 95.03 | 21.29 | 94.37 |
Evaluation results on AniSora-Benchmark:
| Method | Human Evaluation | Visual Smooth | Visual Motion | Visual Appeal | Text-Video Consistency | Image-Video Consistency | Character Consistency |
|--------------------------|------------------|---------------|---------------|---------------|------------------------|-------------------------|-----------------------|
| Vidu-1.5 | 60.98 | 55.37 | **78.95** | 50.68 | 60.71 | 66.85 | 82.57 |
| Opensora-V1.2 | 41.10 | 22.28 | 74.90 | 22.62 | 52.19 | 55.67 | 74.76 |
| Opensora-Plan-V1.3 | 46.14 | 35.08 | 77.47 | 36.14 | 56.19 | 59.42 | 81.19 |
| CogVideoX-5B-V1 | 53.29 | 39.91 | 73.07 | 39.59 | 67.98 | 65.49 | 83.07 |
| MiniMax-I2V01 | 69.63 | 69.38 | 68.05 | **70.34** | **76.14** | 78.74 | 89.47 |
| **AniSora (Ours)** | **70.13** | **71.47** | 47.94 | 64.44 | 72.92 | **81.54** | **94.54** |
| AniSora (Interpolated Avg) | - | 70.78 | 53.02 | 64.41 | 73.56 | 80.62 | 91.59 |
| AniSora (KeyFrame Interp) | - | 70.03 | 58.10 | 64.57 | 74.57 | 80.78 | 91.98 |
| AniSora (KeyFrame Interp) | - | 70.03 | 58.10 | 64.57 | 74.57 | 80.78 | 91.98 |
| GT | - | 92.20 | 58.27 | 89.72 | 92.51 | 94.69 | 95.08 |
AniSora for our I2V results.
AniSora-K for the key frame interpolation results.
AniSora-I for the average results of frame interpolation conditions, including key frame, last frame, mid frame results.
## ๐ณ Benchmark Dataset
The benchmark dataset contains 948 animation video clips are collected and labeled
with different actions. Each label contains 10-30 video clips. The corresponding text prompt is generated by Qwen-VL2 at first, then is corrected manually to guarantee the text-video alignment.
Fill the form and send PDF format to yangsiqian@bilibili.com or xubaohan@bilibili.com (links provided after agreeing with Bilibili)
## ๐ Citation
๐ If you find our work helpful, please leave us a star and cite our paper.
```
@article{jiang2024anisora,
title={AniSora: Exploring the Frontiers of Animation Video Generation in the Sora Era},
author={Yudong Jiang, Baohan Xu, Siqian Yang, Mingyu Yin, Jing Liu, Chao Xu, Siqi Wang, Yidi Wu, Bingwen Zhu, Xinwen Zhang, Xingyu Zheng,Jixuan Xu, Yue Zhang, Jinlong Hou and Huyang Sun},
journal={arXiv preprint arXiv:2412.10255},
year={2024}
}
```