LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training Paper • 2510.14969 • Published 25 days ago • 5
Vibe Checker: Aligning Code Evaluation with Human Preference Paper • 2510.07315 • Published Oct 8 • 30
The Gold Medals in an Empty Room: Diagnosing Metalinguistic Reasoning in LLMs with Camlang Paper • 2509.00425 • Published Aug 30 • 11
MIRIX: Multi-Agent Memory System for LLM-Based Agents Paper • 2507.07957 • Published Jul 10 • 76
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers Paper • 2506.23918 • Published Jun 30 • 88
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published May 30 • 139
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training Paper • 2504.13161 • Published Apr 17 • 93
Preference Leakage: A Contamination Problem in LLM-as-a-judge Paper • 2502.01534 • Published Feb 3 • 40
Why Does the Effective Context Length of LLMs Fall Short? Paper • 2410.18745 • Published Oct 24, 2024 • 18
Law of the Weakest Link: Cross Capabilities of Large Language Models Paper • 2409.19951 • Published Sep 30, 2024 • 54