Enhance model card: Add pipeline tag, paper, project, code links, and usage (#1)
Browse files- Enhance model card: Add pipeline tag, paper, project, code links, and usage (bbd0e9c90cb8c28f6a50a2acd6292cc0cd28686d)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
|
@@ -1,3 +1,46 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
pipeline_tag: text-to-video
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
# CineScale: Free Lunch in High-Resolution Cinematic Visual Generation
|
| 7 |
+
|
| 8 |
+
This repository contains the CineScale models presented in the paper [CineScale: Free Lunch in High-Resolution Cinematic Visual Generation](https://huggingface.co/papers/2508.15774).
|
| 9 |
+
|
| 10 |
+
CineScale proposes a novel inference paradigm to enable higher-resolution visual generation. It broadens the scope by enabling high-resolution I2V (Image-to-Video) and V2V (Video-to-Video) synthesis, built atop state-of-the-art open-source video generation frameworks, significantly improving upon existing methods which are prone to repetitive patterns in high-resolution outputs.
|
| 11 |
+
|
| 12 |
+
**Project Page:** [https://eyeline-labs.github.io/CineScale/](https://eyeline-labs.github.io/CineScale/)
|
| 13 |
+
**Code & Detailed Usage:** [https://github.com/Eyeline-Labs/CineScale](https://github.com/Eyeline-Labs/CineScale)
|
| 14 |
+
|
| 15 |
+
## Models
|
| 16 |
+
CineScale provides a family of models, including Text-to-Video (T2V) and Image-to-Video (I2V) variants, capable of generating videos up to 4K resolution.
|
| 17 |
+
|
| 18 |
+
| Model | Tuning Resolution | Checkpoint | Description |
|
| 19 |
+
| :-------------------------- | :---------------- | :------------------------------------------------------------------------------- | :-------------------------------------------- |
|
| 20 |
+
| CineScale-1.3B-T2V | 1088x1920 | [Hugging Face](https://huggingface.co/Eyeline-Labs/CineScale/blob/main/t2v_1.3b_ntk20.ckpt) | Supports 3K (1632x2880) inference on A100 x 1 |
|
| 21 |
+
| CineScale-14B-T2V | 1088x1920 | [Hugging Face](https://huggingface.co/Eyeline-Labs/CineScale/blob/main/t2v_14b_ntk20.ckpt) | Supports 4K (2176x3840) inference on A100 x 8 |
|
| 22 |
+
| CineScale-14B-I2V | 1088x1920 | [Hugging Face](https://huggingface.co/Eyeline-Labs/CineScale/blob/main/i2v_14b_ntk20.ckpt) | Supports 4K (2176x3840) inference on A100 x 8 |
|
| 23 |
+
|
| 24 |
+
## Quick Start
|
| 25 |
+
To get started, you will need to set up the environment and download the model checkpoints as described in the [GitHub repository](https://github.com/Eyeline-Labs/CineScale).
|
| 26 |
+
|
| 27 |
+
Inference examples for various resolutions and tasks are provided in the GitHub repository's command-line scripts. For instance, to run 2K-resolution text-to-video inference:
|
| 28 |
+
```bash
|
| 29 |
+
# Example for 2K-Resolution Text-to-Video (Base Model Wan2.1-1.3B)
|
| 30 |
+
# Single GPU
|
| 31 |
+
CUDA_VISIBLE_DEVICES=0 python cinescale_t2v1.3b_single.py
|
| 32 |
+
# Multiple GPUs
|
| 33 |
+
torchrun --standalone --nproc_per_node=8 cinescale_t2v1.3b.py
|
| 34 |
+
```
|
| 35 |
+
Refer to the [GitHub repository](https://github.com/Eyeline-Labs/CineScale) for more detailed instructions and examples for 3K and 4K video generation.
|
| 36 |
+
|
| 37 |
+
## Citation
|
| 38 |
+
If you find our work useful, please consider citing our paper:
|
| 39 |
+
```bib
|
| 40 |
+
@article{qiu2025cinescale,
|
| 41 |
+
title={CineScale: Free Lunch in High-Resolution Cinematic Visual Generation},
|
| 42 |
+
author={Haonan Qiu and Ning Yu and Ziqi Huang and Paul Debevec and Ziwei Liu},
|
| 43 |
+
journal={arXiv preprint arXiv:2508.15774},
|
| 44 |
+
year={2025}
|
| 45 |
+
}
|
| 46 |
+
```
|