POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion Paper • 2509.01215 • Published Sep 1 • 50
Visual Representation Alignment for Multimodal Large Language Models Paper • 2509.07979 • Published Sep 9 • 83
Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis Paper • 2509.09595 • Published Sep 11 • 48
Reconstruction Alignment Improves Unified Multimodal Models Paper • 2509.07295 • Published Sep 8 • 40