YM Qin's picture

6 4

YM Qin

Wakals

·

https://wakals.github.io/

AI & ML interests

Computer Vision, Vision-language Model, Generative Model

Recent Activity

reacted to sanaka87's post with 🔥 about 3 hours ago

Excited to share our Unified Multimodal Models new work Reconstruction Alignment (RecA)! 🚀 Just 6 × 80GB A100s × 4.5 hours to boost BAGEL performance across all tasks! Outperforms FLUX-Kontext in image editing capabilities! 📄 Paper: https://alphaxiv.org/abs/2509.07295 💻 Code: https://github.com/HorizonWind2004/reconstruction-alignment 🤗 HF Models: https://huggingface.co/collections/sanaka87/reca-68ad2176380355a3dcedc068 ✍️ DEMO: https://huggingface.co/spaces/sanaka87/BAGEL-RecA 🌐 Project Page: https://reconstruction-alignment.github.io 🔥 X: https://x.com/XDWang101/status/1965908302581420204 📰 Zhihu: https://zhuanlan.zhihu.com/p/1947584568187159814 🤗 HF Daily Paper: https://huggingface.co/papers/2509.07295 ⚡ <10k images & 27 GPU hours (no-arch-changes) → SOTA, surpassing much larger open-source & private models: 📊 GenEval: 0.73 → 0.90 | 📊 DPGBench: 80.93 → 88.15 🖼️ ImgEdit: 3.38 → 3.75 | 🖌️ GEdit: 6.94 → 7.25 ✅ RecA trains UMMs to reconstruct images from their own visual understanding encoder embeddings → big gains in image generation 🎨 & editing ✂️.

updated a collection 1 day ago

CoVT: Chain-of-Visual-Thought

upvoted a paper 1 day ago

Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens

View all activity

Organizations

None yet

Wakals 's models 5

Wakals/CoVT-LLaVA-13B-depth

13B • Updated 4 days ago • 12 • 1

Wakals/CoVT-7B-seg_depth_dino_edge

8B • Updated 4 days ago • 13 • 2

Wakals/CoVT-7B-depth

8B • Updated 4 days ago • 22 • 2

Wakals/CoVT-7B-seg

8B • Updated 4 days ago • 14 • 1

Wakals/CoVT-7B-seg_depth_dino

8B • Updated 4 days ago • 54 • 2