👗

Authors:
Nguyen Dinh Hieu [0009-0002-6683-8036], elt.. Institution: FPT University, Hanoi, Vietnam
📧 [email protected]

🧩 Overview

EquiFashion is a hybrid GAN–Diffusion framework that reconciles the long-standing trade-off between stylistic diversity and photorealistic fidelity in generative fashion design.
It integrates a GAN-based ideation branch for creative exploration and a diffusion-based refinement branch for faithful reconstruction, enabling high-quality, diverse, and robust fashion image generation.

🎨 Try the live demo here:
👉 EquiFashion Demo on Hugging Face Spaces

🎯 Motivation

Fashion design requires models that are simultaneously creative, robust, and trustworthy.
While GANs generate diverse styles but lack stability, and Diffusion Models produce realism but constrain creativity, EquiFashion bridges both worlds—achieving controlled diversity, semantic alignment, and realistic garment rendering.

🧱 Architecture Overview

Component	Description
Latent Diffusion Backbone	Operates in latent space for efficient denoising with high-resolution reconstruction.
GAN Ideation Module	Explores stylistic variations through stochastic latent sampling.
Structural Semantic Consensus	Ensures linguistic–visual correspondence between attributes and garment parts.
Semantic-Bundled Attention	Couples adjective–noun pairs (e.g., “red collar”) for coherent attribute localization.
Pose-Guided Conditioning	Aligns garments naturally to human body structure using OpenPose keypoints.

📂 Dataset Access EquiFashion-DB

The dataset used for training and evaluation is available on Hugging Face:

➡️ NguyenDinhHieu/EquiFashion-DB

Property	Description
Scale	350 K images
Resolution	512×512
Modalities	Image, Text, Sketch, Pose, Fabric
Coverage	40+ apparel categories
Key Feature	Noise-aware text, balanced demographics
Purpose	Training + robust benchmarking for generative fashion

You can load it directly using the datasets library:

from datasets import load_dataset

dataset = load_dataset("NguyenDinhHieu/EquiFashion-DB")
print(dataset)

🧮 Training Configuration

Setting	Value
Framework	PyTorch Lightning 2.2
GPU	NVIDIA A100 (40 GB, CUDA 12.2)
Optimizer	AdamW
Learning Rate	2e-4 (G), 1e-4 (D)
Scheduler	Cosine Decay
Epochs	400 (200 pretrain + 200 joint)
Precision	FP16
Batch Size	32
Timesteps (T)	8
Fusion Decay (γ)	0.7

🧠 Core Equation

The total loss combines autoencoding, adversarial, semantic, and perceptual components:

[ L_{total} = λ_{AE}L_{AE} + λ_{cons}L_{cons} + λ_{bundle}L_{bundle} + λ_{comp}L_{comp} + λ_G(L_G + λ_{MS}L_{MS}) + λ_{den}L_{denoise} + λ_{rob}L_{rob} + λ_{perc}L_{perc} ]

📊 Quantitative Results

Metric	Value	Benchmark
FID ↓	10.3	FashionAI subset
IS ↑	7.8	–
CLIP-S ↑	0.315	–
Coverage ↑	87%	–
Inference Time	3.8 s / sample (512×512, A100, FP16)	–

🖼️ Visual Results

Input Pose	Generated Outfit

🚀 Usage Example

from huggingface_hub import hf_hub_download
from cldm.model import create_model, load_state_dict
import torch

# Download checkpoint
ckpt = hf_hub_download("NguyenDinhHieu/EquiFashionModel", filename="eqf_final.ckpt")

# Load model
model = create_model("utils/configs/cldm_v2.yaml").to("cuda")
model.load_state_dict(load_state_dict(ckpt, location="cuda"))
model.eval()

prompt = "long-sleeve floral dress with tied waist, elegant, 8k detail"

💡 Citation

If you use this model or dataset, please cite:

@inproceedings{nguyen2025equifashion,
  title={EquiFashion: Hybrid GAN–Diffusion Balancing Diversity–Fidelity for Fashion Design Generation},
  author={Tran Minh Khuong and Nguyen Dinh Hieu and Ngo Dinh Hoang Minh and Nguyen Dinh Bach and Phan Duy Hung},
  booktitle={Proceedings of the ..... Conference},
  year={2025},
  organization={FPT University, Hanoi}
}

🧩 File Descriptions

File	Description
`eqf_final.ckpt`	Main Hybrid GAN–Diffusion model checkpoint
`body_pose_model.pth`, `hand_pose_model.pth`	OpenPose keypoint weights
`open_clip_pytorch_model.bin`	Pretrained OpenCLIP text encoder
`app.py`	Gradio demo UI
`utils/configs/cldm_v2.yaml`	Architecture configuration

📚 References

Zhu et al. Be Your Own Prada (ICCV 2017)
Chen et al. TailorGAN (WACV 2020)
Li et al. BC-GAN (CVPR 2019)
Xu et al. AttnGAN (CVPR 2018)
Karras et al. StyleGAN (CVPR 2019)
Zhang et al. DiffCloth (ICCV 2023)
Xie et al. HieraFashDiff (AAAI 2025)
Kim et al. FashionSD-X (arXiv 2024)
Baldrati et al. Multimodal Garment Designer (ICCV 2023)
Rombach et al. Latent Diffusion Models (CVPR 2022)

🪪 License

Released under the MIT License.
You may use, modify, and distribute the model and dataset with attribution.

🧩 Acknowledgment

Developed by FPT University AI Research Group, Hanoi, Vietnam
as part of the EquiAI Research Suite on fairness, robustness, and trustworthy generative AI.

Downloads last month: -; Downloads are not tracked for this model. How to track

NguyenDinhHieu
/

EquiFashionModel

👗