Text-to-Video
MoonQiu nielsr HF Staff commited on
Commit
a23c31a
·
verified ·
1 Parent(s): 8c2715d

Enhance model card: Add pipeline tag, paper, project, code links, and usage (#1)

Browse files

- Enhance model card: Add pipeline tag, paper, project, code links, and usage (bbd0e9c90cb8c28f6a50a2acd6292cc0cd28686d)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +46 -3
README.md CHANGED
@@ -1,3 +1,46 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: text-to-video
4
+ ---
5
+
6
+ # CineScale: Free Lunch in High-Resolution Cinematic Visual Generation
7
+
8
+ This repository contains the CineScale models presented in the paper [CineScale: Free Lunch in High-Resolution Cinematic Visual Generation](https://huggingface.co/papers/2508.15774).
9
+
10
+ CineScale proposes a novel inference paradigm to enable higher-resolution visual generation. It broadens the scope by enabling high-resolution I2V (Image-to-Video) and V2V (Video-to-Video) synthesis, built atop state-of-the-art open-source video generation frameworks, significantly improving upon existing methods which are prone to repetitive patterns in high-resolution outputs.
11
+
12
+ **Project Page:** [https://eyeline-labs.github.io/CineScale/](https://eyeline-labs.github.io/CineScale/)
13
+ **Code & Detailed Usage:** [https://github.com/Eyeline-Labs/CineScale](https://github.com/Eyeline-Labs/CineScale)
14
+
15
+ ## Models
16
+ CineScale provides a family of models, including Text-to-Video (T2V) and Image-to-Video (I2V) variants, capable of generating videos up to 4K resolution.
17
+
18
+ | Model | Tuning Resolution | Checkpoint | Description |
19
+ | :-------------------------- | :---------------- | :------------------------------------------------------------------------------- | :-------------------------------------------- |
20
+ | CineScale-1.3B-T2V | 1088x1920 | [Hugging Face](https://huggingface.co/Eyeline-Labs/CineScale/blob/main/t2v_1.3b_ntk20.ckpt) | Supports 3K (1632x2880) inference on A100 x 1 |
21
+ | CineScale-14B-T2V | 1088x1920 | [Hugging Face](https://huggingface.co/Eyeline-Labs/CineScale/blob/main/t2v_14b_ntk20.ckpt) | Supports 4K (2176x3840) inference on A100 x 8 |
22
+ | CineScale-14B-I2V | 1088x1920 | [Hugging Face](https://huggingface.co/Eyeline-Labs/CineScale/blob/main/i2v_14b_ntk20.ckpt) | Supports 4K (2176x3840) inference on A100 x 8 |
23
+
24
+ ## Quick Start
25
+ To get started, you will need to set up the environment and download the model checkpoints as described in the [GitHub repository](https://github.com/Eyeline-Labs/CineScale).
26
+
27
+ Inference examples for various resolutions and tasks are provided in the GitHub repository's command-line scripts. For instance, to run 2K-resolution text-to-video inference:
28
+ ```bash
29
+ # Example for 2K-Resolution Text-to-Video (Base Model Wan2.1-1.3B)
30
+ # Single GPU
31
+ CUDA_VISIBLE_DEVICES=0 python cinescale_t2v1.3b_single.py
32
+ # Multiple GPUs
33
+ torchrun --standalone --nproc_per_node=8 cinescale_t2v1.3b.py
34
+ ```
35
+ Refer to the [GitHub repository](https://github.com/Eyeline-Labs/CineScale) for more detailed instructions and examples for 3K and 4K video generation.
36
+
37
+ ## Citation
38
+ If you find our work useful, please consider citing our paper:
39
+ ```bib
40
+ @article{qiu2025cinescale,
41
+ title={CineScale: Free Lunch in High-Resolution Cinematic Visual Generation},
42
+ author={Haonan Qiu and Ning Yu and Ziqi Huang and Paul Debevec and Ziwei Liu},
43
+ journal={arXiv preprint arXiv:2508.15774},
44
+ year={2025}
45
+ }
46
+ ```