Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark Paper • 2510.26802 • Published 9 days ago • 32
Video-As-Prompt: Unified Semantic Control for Video Generation Paper • 2510.20888 • Published 16 days ago • 44
DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search Paper • 2510.12801 • Published 25 days ago • 13
SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models Paper • 2510.12784 • Published 25 days ago • 20
PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs Paper • 2510.09507 • Published 29 days ago • 10
UniVideo: Unified Understanding, Generation, and Editing for Videos Paper • 2510.08377 • Published about 1 month ago • 70
VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning Paper • 2510.08555 • Published about 1 month ago • 62
Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation Paper • 2510.01284 • Published Sep 30 • 31
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation Paper • 2510.02283 • Published Oct 2 • 92
SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer Paper • 2509.24695 • Published Sep 29 • 44
LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer Paper • 2509.22414 • Published Sep 26 • 21
SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent Paper • 2509.20414 • Published Sep 24 • 9
Seedream 4.0: Toward Next-generation Multimodal Image Generation Paper • 2509.20427 • Published Sep 24 • 76
Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation Paper • 2509.18824 • Published Sep 23 • 22
HERO: Hierarchical Extrapolation and Refresh for Efficient World Models Paper • 2508.17588 • Published Aug 25 • 2