ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning Paper • 2510.27492 • Published 10 days ago • 78
FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting Paper • 2509.24304 • Published Sep 29 • 4
Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration Paper • 2509.14760 • Published Sep 18 • 52
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers Paper • 2506.23918 • Published Jun 30 • 88
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy Paper • 2507.01352 • Published Jul 2 • 54
IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction Paper • 2507.02025 • Published Jul 2 • 35
VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models Paper • 2505.23656 • Published May 29 • 25
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning Paper • 2506.04207 • Published Jun 4 • 48
LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid Paper • 2502.07563 • Published Feb 11 • 24
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment Paper • 2502.16894 • Published Feb 24 • 32
Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think Paper • 2503.00948 • Published Mar 2 • 3
Liger: Linearizing Large Language Models to Gated Recurrent Structures Paper • 2503.01496 • Published Mar 3 • 18
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts Paper • 2503.05447 • Published Mar 7 • 8
From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration Paper • 2503.12821 • Published Mar 17 • 9
BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries Paper • 2503.12446 • Published Mar 16 • 1