Efficient-Large-Model
/

SANA1.5_4.8B_1024px_diffusers

1024px_based_image_size

Model card Files Files and versions

Lawrence-cj commited on Mar 21

Commit

231ba75

·

verified ·

1 Parent(s): e552e55

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -61,7 +61,7 @@ Source code is available at https://github.com/NVlabs/Sana.
 - **Model Description:** This is a model that can be used to generate and modify images based on text prompts.
 It is a Linear Diffusion Transformer that uses one fixed, pretrained text encoders ([Gemma2-2B-IT](https://huggingface.co/google/gemma-2-2b-it))
 and one 32x spatial-compressed latent feature encoder ([DC-AE](https://hanlab.mit.edu/projects/dc-ae)).
-- **Resources for more information:** Check out our [GitHub Repository](https://github.com/NVlabs/Sana) and the [Sana report on arXiv](https://arxiv.org/abs/2410.10629).
 ### Model Sources

 - **Model Description:** This is a model that can be used to generate and modify images based on text prompts.
 It is a Linear Diffusion Transformer that uses one fixed, pretrained text encoders ([Gemma2-2B-IT](https://huggingface.co/google/gemma-2-2b-it))
 and one 32x spatial-compressed latent feature encoder ([DC-AE](https://hanlab.mit.edu/projects/dc-ae)).
+- **Resources for more information:** Check out our [GitHub Repository](https://github.com/NVlabs/Sana) and the [SANA-1.5 report on arXiv](https://arxiv.org/abs/2501.18427).
 ### Model Sources