Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
6
4
YM Qin
Wakals
Follow
Deiweiwei's profile picture
sanaka87's profile picture
2 followers
ยท
2 following
https://wakals.github.io/
AI & ML interests
Computer Vision, Vision-language Model, Generative Model
Recent Activity
reacted
to
sanaka87
's
post
with ๐ฅ
about 3 hours ago
Excited to share our Unified Multimodal Models new work Reconstruction Alignment (RecA)! ๐ Just 6 ร 80GB A100s ร 4.5 hours to boost BAGEL performance across all tasks! Outperforms FLUX-Kontext in image editing capabilities! ๐ Paper: https://alphaxiv.org/abs/2509.07295 ๐ป Code: https://github.com/HorizonWind2004/reconstruction-alignment ๐ค HF Models: https://huggingface.co/collections/sanaka87/reca-68ad2176380355a3dcedc068 โ๏ธ DEMO: https://huggingface.co/spaces/sanaka87/BAGEL-RecA ๐ Project Page: https://reconstruction-alignment.github.io ๐ฅ X: https://x.com/XDWang101/status/1965908302581420204 ๐ฐ Zhihu: https://zhuanlan.zhihu.com/p/1947584568187159814 ๐ค HF Daily Paper: https://huggingface.co/papers/2509.07295 โก <10k images & 27 GPU hours (no-arch-changes) โ SOTA, surpassing much larger open-source & private models: ๐ GenEval: 0.73 โ 0.90 | ๐ DPGBench: 80.93 โ 88.15 ๐ผ๏ธ ImgEdit: 3.38 โ 3.75 | ๐๏ธ GEdit: 6.94 โ 7.25 โ RecA trains UMMs to reconstruct images from their own visual understanding encoder embeddings โ big gains in image generation ๐จ & editing โ๏ธ.
updated
a collection
1 day ago
CoVT: Chain-of-Visual-Thought
upvoted
a
paper
1 day ago
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens
View all activity
Organizations
None yet
Wakals
's models
5
Sort:ย Recently updated
Wakals/CoVT-LLaVA-13B-depth
13B
โข
Updated
4 days ago
โข
12
โข
1
Wakals/CoVT-7B-seg_depth_dino_edge
8B
โข
Updated
4 days ago
โข
13
โข
2
Wakals/CoVT-7B-depth
8B
โข
Updated
4 days ago
โข
22
โข
2
Wakals/CoVT-7B-seg
8B
โข
Updated
4 days ago
โข
14
โข
1
Wakals/CoVT-7B-seg_depth_dino
8B
โข
Updated
4 days ago
โข
54
โข
2