Multi-Modal - a SiyuanYin Collection

SiyuanYin 's Collections

Video

3D/4D

Multi-Modal

updated Sep 15

POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion

Paper • 2509.01215 • Published Sep 1 • 50
Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9 • 83
Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis

Paper • 2509.09595 • Published Sep 11 • 48
Reconstruction Alignment Improves Unified Multimodal Models

Paper • 2509.07295 • Published Sep 8 • 40