👗

Authors:
Nguyen Dinh Hieu [0009-0002-6683-8036], elt.. Institution: FPT University, Hanoi, Vietnam
📧 [email protected]

🧩 Overview

EquiFashion is a hybrid GAN–Diffusion framework that reconciles the long-standing trade-off between stylistic diversity and photorealistic fidelity in generative fashion design.
It integrates a GAN-based ideation branch for creative exploration and a diffusion-based refinement branch for faithful reconstruction, enabling high-quality, diverse, and robust fashion image generation.

🎨 Try the live demo here:
👉 EquiFashion Demo on Hugging Face Spaces

🎯 Motivation

Fashion design requires models that are simultaneously creative, robust, and trustworthy.
While GANs generate diverse styles but lack stability, and Diffusion Models produce realism but constrain creativity, EquiFashion bridges both worlds—achieving controlled diversity, semantic alignment, and realistic garment rendering.

🧱 Architecture Overview

Component Description
Latent Diffusion Backbone Operates in latent space for efficient denoising with high-resolution reconstruction.
GAN Ideation Module Explores stylistic variations through stochastic latent sampling.
Structural Semantic Consensus Ensures linguistic–visual correspondence between attributes and garment parts.
Semantic-Bundled Attention Couples adjective–noun pairs (e.g., “red collar”) for coherent attribute localization.
Pose-Guided Conditioning Aligns garments naturally to human body structure using OpenPose keypoints.

📂 Dataset Access EquiFashion-DB

The dataset used for training and evaluation is available on Hugging Face:

➡️ NguyenDinhHieu/EquiFashion-DB

Property Description
Scale 350 K images
Resolution 512×512
Modalities Image, Text, Sketch, Pose, Fabric
Coverage 40+ apparel categories
Key Feature Noise-aware text, balanced demographics
Purpose Training + robust benchmarking for generative fashion

You can load it directly using the datasets library:

from datasets import load_dataset

dataset = load_dataset("NguyenDinhHieu/EquiFashion-DB")
print(dataset)

🧮 Training Configuration

Setting Value
Framework PyTorch Lightning 2.2
GPU NVIDIA A100 (40 GB, CUDA 12.2)
Optimizer AdamW
Learning Rate 2e-4 (G), 1e-4 (D)
Scheduler Cosine Decay
Epochs 400 (200 pretrain + 200 joint)
Precision FP16
Batch Size 32
Timesteps (T) 8
Fusion Decay (γ) 0.7

🧠 Core Equation

The total loss combines autoencoding, adversarial, semantic, and perceptual components:

[ L_{total} = λ_{AE}L_{AE} + λ_{cons}L_{cons} + λ_{bundle}L_{bundle} + λ_{comp}L_{comp} + λ_G(L_G + λ_{MS}L_{MS}) + λ_{den}L_{denoise} + λ_{rob}L_{rob} + λ_{perc}L_{perc} ]

📊 Quantitative Results

Metric Value Benchmark
FID ↓ 10.3 FashionAI subset
IS ↑ 7.8
CLIP-S ↑ 0.315
Coverage ↑ 87%
Inference Time 3.8 s / sample (512×512, A100, FP16)

🖼️ Visual Results

Input Pose Generated Outfit

🚀 Usage Example

from huggingface_hub import hf_hub_download
from cldm.model import create_model, load_state_dict
import torch

# Download checkpoint
ckpt = hf_hub_download("NguyenDinhHieu/EquiFashionModel", filename="eqf_final.ckpt")

# Load model
model = create_model("utils/configs/cldm_v2.yaml").to("cuda")
model.load_state_dict(load_state_dict(ckpt, location="cuda"))
model.eval()

prompt = "long-sleeve floral dress with tied waist, elegant, 8k detail"

💡 Citation

If you use this model or dataset, please cite:

@inproceedings{nguyen2025equifashion,
  title={EquiFashion: Hybrid GAN–Diffusion Balancing Diversity–Fidelity for Fashion Design Generation},
  author={Tran Minh Khuong and Nguyen Dinh Hieu and Ngo Dinh Hoang Minh and Nguyen Dinh Bach and Phan Duy Hung},
  booktitle={Proceedings of the ..... Conference},
  year={2025},
  organization={FPT University, Hanoi}
}

🧩 File Descriptions

File Description
eqf_final.ckpt Main Hybrid GAN–Diffusion model checkpoint
body_pose_model.pth, hand_pose_model.pth OpenPose keypoint weights
open_clip_pytorch_model.bin Pretrained OpenCLIP text encoder
app.py Gradio demo UI
utils/configs/cldm_v2.yaml Architecture configuration

📚 References

  1. Zhu et al. Be Your Own Prada (ICCV 2017)
  2. Chen et al. TailorGAN (WACV 2020)
  3. Li et al. BC-GAN (CVPR 2019)
  4. Xu et al. AttnGAN (CVPR 2018)
  5. Karras et al. StyleGAN (CVPR 2019)
  6. Zhang et al. DiffCloth (ICCV 2023)
  7. Xie et al. HieraFashDiff (AAAI 2025)
  8. Kim et al. FashionSD-X (arXiv 2024)
  9. Baldrati et al. Multimodal Garment Designer (ICCV 2023)
  10. Rombach et al. Latent Diffusion Models (CVPR 2022)

🪪 License

Released under the MIT License.
You may use, modify, and distribute the model and dataset with attribution.

🧩 Acknowledgment

Developed by FPT University AI Research Group, Hanoi, Vietnam
as part of the EquiAI Research Suite on fairness, robustness, and trustworthy generative AI.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train NguyenDinhHieu/EquiFashionModel

Space using NguyenDinhHieu/EquiFashionModel 1