Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs Paper • 2510.18876 • Published 18 days ago • 35
Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology Paper • 2507.07999 • Published Jul 10 • 49
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation Paper • 2412.07589 • Published Dec 10, 2024 • 48