ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder Paper • 2510.18795 • Published about 1 month ago • 10 • 2
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning Paper • 2510.13515 • Published Oct 15 • 11 • 2
Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval Paper • 2509.09118 • Published Sep 11 • 8 • 2
ForCenNet: Foreground-Centric Network for Document Image Rectification Paper • 2507.19804 • Published Jul 26 • 11 • 2
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs Paper • 2504.17432 • Published Apr 24 • 39 • 4
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs Paper • 2504.17432 • Published Apr 24 • 39 • 4
Decoupled Global-Local Alignment for Improving Compositional Understanding Paper • 2504.16801 • Published Apr 23 • 14 • 2
RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm Paper • 2502.12513 • Published Feb 18 • 16 • 2