👗
Authors:
Nguyen Dinh Hieu [0009-0002-6683-8036], elt..
Institution: FPT University, Hanoi, Vietnam
📧 [email protected]
🧩 Overview
EquiFashion is a hybrid GAN–Diffusion framework that reconciles the long-standing trade-off between stylistic diversity and photorealistic fidelity in generative fashion design.
It integrates a GAN-based ideation branch for creative exploration and a diffusion-based refinement branch for faithful reconstruction, enabling high-quality, diverse, and robust fashion image generation.
🎨 Try the live demo here:
👉 EquiFashion Demo on Hugging Face Spaces
🎯 Motivation
Fashion design requires models that are simultaneously creative, robust, and trustworthy.
While GANs generate diverse styles but lack stability, and Diffusion Models produce realism but constrain creativity, EquiFashion bridges both worlds—achieving controlled diversity, semantic alignment, and realistic garment rendering.
🧱 Architecture Overview
| Component | Description |
|---|---|
| Latent Diffusion Backbone | Operates in latent space for efficient denoising with high-resolution reconstruction. |
| GAN Ideation Module | Explores stylistic variations through stochastic latent sampling. |
| Structural Semantic Consensus | Ensures linguistic–visual correspondence between attributes and garment parts. |
| Semantic-Bundled Attention | Couples adjective–noun pairs (e.g., “red collar”) for coherent attribute localization. |
| Pose-Guided Conditioning | Aligns garments naturally to human body structure using OpenPose keypoints. |
📂 Dataset Access EquiFashion-DB
The dataset used for training and evaluation is available on Hugging Face:
➡️ NguyenDinhHieu/EquiFashion-DB
| Property | Description |
|---|---|
| Scale | 350 K images |
| Resolution | 512×512 |
| Modalities | Image, Text, Sketch, Pose, Fabric |
| Coverage | 40+ apparel categories |
| Key Feature | Noise-aware text, balanced demographics |
| Purpose | Training + robust benchmarking for generative fashion |
You can load it directly using the datasets library:
from datasets import load_dataset
dataset = load_dataset("NguyenDinhHieu/EquiFashion-DB")
print(dataset)
🧮 Training Configuration
| Setting | Value |
|---|---|
| Framework | PyTorch Lightning 2.2 |
| GPU | NVIDIA A100 (40 GB, CUDA 12.2) |
| Optimizer | AdamW |
| Learning Rate | 2e-4 (G), 1e-4 (D) |
| Scheduler | Cosine Decay |
| Epochs | 400 (200 pretrain + 200 joint) |
| Precision | FP16 |
| Batch Size | 32 |
| Timesteps (T) | 8 |
| Fusion Decay (γ) | 0.7 |
🧠 Core Equation
The total loss combines autoencoding, adversarial, semantic, and perceptual components:
[ L_{total} = λ_{AE}L_{AE} + λ_{cons}L_{cons} + λ_{bundle}L_{bundle} + λ_{comp}L_{comp} + λ_G(L_G + λ_{MS}L_{MS}) + λ_{den}L_{denoise} + λ_{rob}L_{rob} + λ_{perc}L_{perc} ]
📊 Quantitative Results
| Metric | Value | Benchmark |
|---|---|---|
| FID ↓ | 10.3 | FashionAI subset |
| IS ↑ | 7.8 | – |
| CLIP-S ↑ | 0.315 | – |
| Coverage ↑ | 87% | – |
| Inference Time | 3.8 s / sample (512×512, A100, FP16) | – |
🖼️ Visual Results
🚀 Usage Example
from huggingface_hub import hf_hub_download
from cldm.model import create_model, load_state_dict
import torch
# Download checkpoint
ckpt = hf_hub_download("NguyenDinhHieu/EquiFashionModel", filename="eqf_final.ckpt")
# Load model
model = create_model("utils/configs/cldm_v2.yaml").to("cuda")
model.load_state_dict(load_state_dict(ckpt, location="cuda"))
model.eval()
prompt = "long-sleeve floral dress with tied waist, elegant, 8k detail"
💡 Citation
If you use this model or dataset, please cite:
@inproceedings{nguyen2025equifashion,
title={EquiFashion: Hybrid GAN–Diffusion Balancing Diversity–Fidelity for Fashion Design Generation},
author={Tran Minh Khuong and Nguyen Dinh Hieu and Ngo Dinh Hoang Minh and Nguyen Dinh Bach and Phan Duy Hung},
booktitle={Proceedings of the ..... Conference},
year={2025},
organization={FPT University, Hanoi}
}
🧩 File Descriptions
| File | Description |
|---|---|
eqf_final.ckpt |
Main Hybrid GAN–Diffusion model checkpoint |
body_pose_model.pth, hand_pose_model.pth |
OpenPose keypoint weights |
open_clip_pytorch_model.bin |
Pretrained OpenCLIP text encoder |
app.py |
Gradio demo UI |
utils/configs/cldm_v2.yaml |
Architecture configuration |
📚 References
- Zhu et al. Be Your Own Prada (ICCV 2017)
- Chen et al. TailorGAN (WACV 2020)
- Li et al. BC-GAN (CVPR 2019)
- Xu et al. AttnGAN (CVPR 2018)
- Karras et al. StyleGAN (CVPR 2019)
- Zhang et al. DiffCloth (ICCV 2023)
- Xie et al. HieraFashDiff (AAAI 2025)
- Kim et al. FashionSD-X (arXiv 2024)
- Baldrati et al. Multimodal Garment Designer (ICCV 2023)
- Rombach et al. Latent Diffusion Models (CVPR 2022)
🪪 License
Released under the MIT License.
You may use, modify, and distribute the model and dataset with attribution.
🧩 Acknowledgment
Developed by FPT University AI Research Group, Hanoi, Vietnam
as part of the EquiAI Research Suite on fairness, robustness, and trustworthy generative AI.

