GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents Paper • 2511.04307 • Published 2 days ago • 12
Learning Vision-Driven Reactive Soccer Skills for Humanoid Robots Paper • 2511.03996 • Published 3 days ago • 3
LiveTradeBench: Seeking Real-World Alpha with Large Language Models Paper • 2511.03628 • Published 3 days ago • 9
LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied Environments with Tool Augmentation Paper • 2511.03001 • Published 4 days ago • 45
MME-CC: A Challenging Multi-Modal Evaluation Benchmark of Cognitive Capacity Paper • 2511.03146 • Published 4 days ago • 6
Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization Paper • 2510.25616 • Published 10 days ago • 88
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought Paper • 2511.02779 • Published 4 days ago • 51
TWIST2: Scalable, Portable, and Holistic Humanoid Data Collection System Paper • 2511.02832 • Published 4 days ago • 8
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation Paper • 2511.02778 • Published 4 days ago • 95
MotionStream: Real-Time Video Generation with Interactive Motion Controls Paper • 2511.01266 • Published 6 days ago • 20
TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning Paper • 2511.01833 • Published 5 days ago • 15
PHUMA: Physically-Grounded Humanoid Locomotion Dataset Paper • 2510.26236 • Published 10 days ago • 27
How Far Are Surgeons from Surgical World Models? A Pilot Study on Zero-shot Surgical Video Generation with Expert Assessment Paper • 2511.01775 • Published 5 days ago • 6
World Simulation with Video Foundation Models for Physical AI Paper • 2511.00062 • Published 11 days ago • 35