GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents Paper • 2511.04307 • Published 3 days ago • 12
Learning Vision-Driven Reactive Soccer Skills for Humanoid Robots Paper • 2511.03996 • Published 3 days ago • 3
Learning Vision-Driven Reactive Soccer Skills for Humanoid Robots Paper • 2511.03996 • Published 3 days ago • 3 • 2
LiveTradeBench: Seeking Real-World Alpha with Large Language Models Paper • 2511.03628 • Published 3 days ago • 9
LiveTradeBench: Seeking Real-World Alpha with Large Language Models Paper • 2511.03628 • Published 3 days ago • 9 • 2
LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied Environments with Tool Augmentation Paper • 2511.03001 • Published 4 days ago • 45
LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied Environments with Tool Augmentation Paper • 2511.03001 • Published 4 days ago • 45 • 2
MME-CC: A Challenging Multi-Modal Evaluation Benchmark of Cognitive Capacity Paper • 2511.03146 • Published 4 days ago • 6
MME-CC: A Challenging Multi-Modal Evaluation Benchmark of Cognitive Capacity Paper • 2511.03146 • Published 4 days ago • 6 • 2