DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research Paper • 2511.19399 • Published 4 days ago • 47
Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs Paper • 2511.16664 • Published 8 days ago • 22
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe Paper • 2511.16334 • Published 8 days ago • 86
FLEX: Continuous Agent Evolution via Forward Learning from Experience Paper • 2511.06449 • Published 19 days ago • 11
The Station: An Open-World Environment for AI-Driven Discovery Paper • 2511.06309 • Published 20 days ago • 34
IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction Paper • 2511.07327 • Published 18 days ago • 73
LiveTradeBench: Seeking Real-World Alpha with Large Language Models Paper • 2511.03628 • Published 23 days ago • 11
MME-CC: A Challenging Multi-Modal Evaluation Benchmark of Cognitive Capacity Paper • 2511.03146 • Published 24 days ago • 7
LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied Environments with Tool Augmentation Paper • 2511.03001 • Published 24 days ago • 46
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation Paper • 2511.02778 • Published 24 days ago • 101
Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization Paper • 2510.25616 • Published about 1 month ago • 94
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought Paper • 2511.02779 • Published 24 days ago • 55
TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning Paper • 2511.01833 • Published 25 days ago • 15
Kimi Linear: An Expressive, Efficient Attention Architecture Paper • 2510.26692 • Published 29 days ago • 112
The End of Manual Decoding: Towards Truly End-to-End Language Models Paper • 2510.26697 • Published 29 days ago • 113