Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL Paper • 2508.13167 • Published Aug 6 • 127
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning Paper • 2505.17667 • Published May 23 • 88
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward Paper • 2505.17018 • Published May 22 • 15
Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning Paper • 2505.14684 • Published May 20 • 24
NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification Paper • 2505.16938 • Published May 22 • 120
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning Paper • 2505.16410 • Published May 22 • 58
Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning Paper • 2503.16252 • Published Mar 20 • 29
Understanding and Diagnosing Deep Reinforcement Learning Paper • 2406.16979 • Published Jun 23, 2024 • 10