GS-Reasoner Collection Collections of paper "Reasoning in Space via Grounding in the World" • 6 items • Updated 30 days ago • 2
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25 • 205
ODYSSEY: Open-World Quadrupeds Exploration and Manipulation for Long-Horizon Tasks Paper • 2508.08240 • Published Aug 11 • 45
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale Paper • 2508.10711 • Published Aug 14 • 142
Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models Paper • 2507.13344 • Published Jul 17 • 56
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning Paper • 2507.05255 • Published Jul 7 • 74
DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge Paper • 2507.04447 • Published Jul 6 • 44
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models Paper • 2506.03135 • Published Jun 3 • 39
ShapeLLM-Omni: A Native Multimodal LLM for 3D Generation and Understanding Paper • 2506.01853 • Published Jun 2 • 32
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time Paper • 2505.24863 • Published May 30 • 97
Step1X-Edit: A Practical Framework for General Image Editing Paper • 2504.17761 • Published Apr 24 • 92
DreamLLM Collection [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation (https://arxiv.org/abs/2309.11499) • 6 items • Updated Mar 22, 2024 • 3
Unleashing Vecset Diffusion Model for Fast Shape Generation Paper • 2503.16302 • Published Mar 20 • 43
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey Paper • 2503.12605 • Published Mar 16 • 35
SoFar Collection Collections of NeurIPS 2025 Spotlight paper: "SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation" • 5 items • Updated Sep 24 • 3
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation Paper • 2502.13143 • Published Feb 18 • 31
ShapeLLM Collection Model collections of ECCV 2024 paper: "ShapeLLM: Universal 3D Object Understanding for Embodied Interaction". • 8 items • Updated Jul 16, 2024 • 5