blowing-up-groundhogs
/

emuru_vae

Model card Files Files and versions

fquattrini commited on Jul 31

Commit

139f13a

·

verified ·

1 Parent(s): 477d765

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -24,7 +24,7 @@ library_name: diffusers
 ![Image 2](samples/lam_sample_reconstructed.png)
 This repository hosts the **Emuru Convolutional VAE**, described in our [CVPR2025 paper](https://arxiv.org/pdf/2503.17074). The model features a convolutional encoder and decoder, each with four layers. The output channels for these layers are 32, 64, 128, and 256, respectively. The encoder downsamples an input RGB image (with three channels and dimensions width and height) to a latent representation with a single channel and spatial dimensions that are one-eighth of the original height and width. This design compresses the style information in the image, allowing a lightweight Transformer Decoder to efficiently process the latent features.
-Training code is released on [GitHub](https://github.com/aimagelab/Zero-Shot-Styled-Text-Image-Generation-but-Make-It-Autoregressive).
 ### Training Details

 ![Image 2](samples/lam_sample_reconstructed.png)
 This repository hosts the **Emuru Convolutional VAE**, described in our [CVPR2025 paper](https://arxiv.org/pdf/2503.17074). The model features a convolutional encoder and decoder, each with four layers. The output channels for these layers are 32, 64, 128, and 256, respectively. The encoder downsamples an input RGB image (with three channels and dimensions width and height) to a latent representation with a single channel and spatial dimensions that are one-eighth of the original height and width. This design compresses the style information in the image, allowing a lightweight Transformer Decoder to efficiently process the latent features.
+Training code is released on [GitHub](https://github.com/aimagelab/Emuru-autoregressive-text-img).
 ### Training Details